Numerical Recipes

Transcript

1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Numerical Recipes in C The Art of Scientific Computing Second Edition William H. Press Harvard-Smithsonian Center for Astrophysics Saul A. Teukolsky Department of Physics, Cornell University William T. Vetterling Polaroid Corporation Brian P. Flannery EXXON Research and Engineering Company g of machine- isit website ica). CAMBRIDGE UNIVERSITY PRESS Cambridge New York Port Chester Melbourne Sydney

2 Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 1RP 40 West 20th Street, New York, NY 10011-4211, USA 477 Williamstown Road, Port Melbourne, VIC, 3207, Australia c © Copyright Cambridge University Press 1988, 1992 except for § 13.10 and Appendix B, which are placed into the public domain, and except for all other computer programs and procedures, which are http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v c Numerical Recipes Software 1987, 1988, 1992, 1997, 2002 Copyright © All Rights Reserved. Some sections of this book were originally published, in different form, in Computers c in Physics magazine, Copyright © American Institute of Physics, 1988–1992. First Edition originally published 1988; Second Edition originally published 1992. Reprinted with corrections, 1993, 1994, 1995, 1997, 2002. This reprinting is corrected to software version 2.10 Printed in the United States of America Typeset in T X E Without an additional license to use the contained software, this book is intended as a text and reference book, for reading purposes only. A free license for limited use of the software by the individual owner of a copy of this book who personally types one or more routines into a single computer is granted under terms described on p. xvii. See the section “License Information” (pp. xvi–xviii) for information on obtaining more general licenses at low cost. Machine-readable media containing the software in this book, with included licenses for use on a single screen, are available from Cambridge University Press. See the order form at the back of the book, email to “[email protected]” (North America) or “[email protected]” (rest of world), or write to Cambridge University Press, 110 Midland Avenue, Port Chester, NY 10573 (USA), for further information. The software may also be downloaded, with immediate purchase of a license also possible, from the Numerical Recipes Software Web Site ( http://www.nr.com ). Unlicensed transfer of Numerical Recipes programs to any other format, or to any computer except one that is specifically licensed, is strictly prohibited. Technical questions, corrections, and requests for information should be addressed to Numerical Recipes Software, P.O. Box 380243, Cambridge, MA 02238-0243 (USA), email “[email protected]”, or fax 781 863-1739. Library of Congress Cataloging in Publication Data Numerical recipes in C : the art of scientific computing / William H. Press ... [et al.]. – 2nd ed. Includes bibliographical references (p. ) and index. ISBN 0-521-43108-5 1. Numerical analysis–Computer programs. 2. Science–Mathematics–Computer programs. g of machine- isit website 3. C (Computer program language) I. Press, William H. QA297.N866 1992 ica). ′ ′ 0285 53–dc20 92-8876 519.4 A catalog record for this book is available from the British Library. ISBN 0 521 43108 5 Book ISBN 0 521 43720 2 Example book in C ISBN 0 521 75037 7 C/C++ CDROM (Windows/Macintosh) ISBN 0 521 75035 0 Complete CDROM (Windows/Macintosh) ISBN 0 521 75036 9 Complete CDROM (UNIX/Linux)

3 Contents Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Preface to the Second Edition xi Preface to the First Edition xiv License Information xvi Computer Programs by Chapter and Section xix 1 1 Preliminaries 1.0 Introduction 1 1.1 Program Organization and Control Structures 5 1.2 Some C Conventions for Scientific Computing 15 1.3 Error, Accuracy, and Stability 28 2 Solution of Linear Algebraic Equations 32 2.0 Introduction 32 2.1 Gauss-Jordan Elimination 36 41 2.2 Gaussian Elimination with Backsubstitution 43 2.3 LU Decomposition and Its Applications 50 2.4 Tridiagonal and Band Diagonal Systems of Equations 2.5 Iterative Improvement of a Solution to Linear Equations 55 2.6 Singular Value Decomposition 59 2.7 Sparse Linear Systems 71 2.8 Vandermonde Matrices and Toeplitz Matrices 90 2.9 Cholesky Decomposition 96 2.10 QR Decomposition 98 3 Process? 102 2.11 Is Matrix Inversion an N g of machine- isit website ica). 3 105 Interpolation and Extrapolation 3.0 Introduction 105 3.1 Polynomial Interpolation and Extrapolation 108 3.2 Rational Function Interpolation and Extrapolation 111 3.3 Cubic Spline Interpolation 113 3.4 How to Search an Ordered Table 117 3.5 Coefficients of the Interpolating Polynomial 120 3.6 Interpolation in Two or More Dimensions 123 v

4 vi Contents 4 Integration of Functions 129 129 4.0 Introduction 4.1 Classical Formulas for Equally Spaced Abscissas 130 4.2 Elementary Algorithms 136 4.3 Romberg Integration 140 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 4.4 Improper Integrals 141 4.5 Gaussian Quadratures and Orthogonal Polynomials 147 4.6 Multidimensional Integrals 161 5 165 Evaluation of Functions 5.0 Introduction 165 5.1 Series and Their Convergence 165 5.2 Evaluation of Continued Fractions 169 5.3 Polynomials and Rational Functions 173 5.4 Complex Arithmetic 176 5.5 Recurrence Relations and Clenshaw’s Recurrence Formula 178 5.6 Quadratic and Cubic Equations 183 5.7 Numerical Derivatives 186 5.8 Chebyshev Approximation 190 195 5.9 Derivatives or Integrals of a Chebyshev-approximated Function 197 5.10 Polynomial Approximation from Chebyshev Coefficients 198 5.11 Economization of Power Series ́ 5.12 Pad e Approximants 200 204 5.13 Rational Chebyshev Approximation 5.14 Evaluation of Functions by Path Integration 208 6 Special Functions 212 6.0 Introduction 212 213 6.1 Gamma Function, Beta Function, Factorials, Binomial Coefficients 6.2 Incomplete Gamma Function, Error Function, Chi-Square Probability Function, Cumulative Poisson Function 216 6.3 Exponential Integrals 222 6.4 Incomplete Beta Function, Student’s Distribution, F-Distribution, Cumulative Binomial Distribution 226 6.5 Bessel Functions of Integer Order 230 236 6.6 Modified Bessel Functions of Integer Order 6.7 Bessel Functions of Fractional Order, Airy Functions, Spherical Bessel Functions 240 g of machine- isit website 6.8 Spherical Harmonics 252 ica). 6.9 Fresnel Integrals, Cosine and Sine Integrals 255 6.10 Dawson’s Integral 259 6.11 Elliptic Integrals and Jacobian Elliptic Functions 261 6.12 Hypergeometric Functions 271 7 Random Numbers 274 7.0 Introduction 274 7.1 Uniform Deviates 275

5 Contents vii 7.2 Transformation Method: Exponential and Normal Deviates 287 290 7.3 Rejection Method: Gamma, Poisson, Binomial Deviates 296 7.4 Generation of Random Bits 7.5 Random Sequences Based on Data Encryption 300 7.6 Simple Monte Carlo Integration 304 7.7 Quasi- (that is, Sub-) Random Sequences 309 readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 7.8 Adaptive and Recursive Monte Carlo Methods 316 329 8 Sorting 8.0 Introduction 329 8.1 Straight Insertion and Shell’s Method 330 8.2 Quicksort 332 8.3 Heapsort 336 8.4 Indexing and Ranking 338 8.5 Selecting the M th Largest 341 8.6 Determination of Equivalence Classes 345 9 Root Finding and Nonlinear Sets of Equations 347 347 9.0 Introduction 9.1 Bracketing and Bisection 350 354 9.2 Secant Method, False Position Method, and Ridders’ Method 9.3 Van Wijngaarden–Dekker–Brent Method 359 9.4 Newton-Raphson Method Using Derivative 362 9.5 Roots of Polynomials 369 9.6 Newton-Raphson Method for Nonlinear Systems of Equations 379 9.7 Globally Convergent Methods for Nonlinear Systems of Equations 383 10 Minimization or Maximization of Functions 394 10.0 Introduction 394 10.1 Golden Section Search in One Dimension 397 10.2 Parabolic Interpolation and Brent’s Method in One Dimension 402 10.3 One-Dimensional Search with First Derivatives 405 10.4 Downhill Simplex Method in Multidimensions 408 10.5 Direction Set (Powell’s) Methods in Multidimensions 412 10.6 Conjugate Gradient Methods in Multidimensions 420 10.7 Variable Metric Methods in Multidimensions 425 10.8 Linear Programming and the Simplex Method 430 10.9 Simulated Annealing Methods 444 g of machine- isit website ica). 11 Eigensystems 456 11.0 Introduction 456 11.1 Jacobi Transformations of a Symmetric Matrix 463 11.2 Reduction of a Symmetric Matrix to Tridiagonal Form: Givens and Householder Reductions 469 11.3 Eigenvalues and Eigenvectors of a Tridiagonal Matrix 475 11.4 Hermitian Matrices 481 11.5 Reduction of a General Matrix to Hessenberg Form 482

6 viii Contents 486 11.6 The QR Algorithm for Real Hessenberg Matrices 11.7 Improving Eigenvalues and/or Finding Eigenvectors by 493 Inverse Iteration 12 Fast Fourier Transform 496 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 496 12.0 Introduction 12.1 Fourier Transform of Discretely Sampled Data 500 12.2 Fast Fourier Transform (FFT) 504 12.3 FFT of Real Functions, Sine and Cosine Transforms 510 12.4 FFT in Two or More Dimensions 521 12.5 Fourier Transforms of Real Data in Two and Three Dimensions 525 12.6 External Storage or Memory-Local FFTs 532 13 Fourier and Spectral Applications 537 13.0 Introduction 537 13.1 Convolution and Deconvolution Using the FFT 538 13.2 Correlation and Autocorrelation Using the FFT 545 13.3 Optimal (Wiener) Filtering with the FFT 547 13.4 Power Spectrum Estimation Using the FFT 549 558 13.5 Digital Filtering in the Time Domain 564 13.6 Linear Prediction and Linear Predictive Coding 13.7 Power Spectrum Estimation by the Maximum Entropy 572 (All Poles) Method 13.8 Spectral Analysis of Unevenly Sampled Data 575 13.9 Computing Fourier Integrals Using the FFT 584 13.10 Wavelet Transforms 591 13.11 Numerical Use of the Sampling Theorem 606 14 Statistical Description of Data 609 14.0 Introduction 609 14.1 Moments of a Distribution: Mean, Variance, Skewness, and So Forth 610 14.2 Do Two Distributions Have the Same Means or Variances? 615 620 14.3 Are Two Distributions Different? 14.4 Contingency Table Analysis of Two Distributions 628 14.5 Linear Correlation 636 14.6 Nonparametric or Rank Correlation 639 645 14.7 Do Two-Dimensional Distributions Differ? g of machine- isit website 14.8 Savitzky-Golay Smoothing Filters 650 ica). 15 Modeling of Data 656 15.0 Introduction 656 15.1 Least Squares as a Maximum Likelihood Estimator 657 15.2 Fitting Data to a Straight Line 661 15.3 Straight-Line Data with Errors in Both Coordinates 666 15.4 General Linear Least Squares 671 15.5 Nonlinear Models 681

7 Contents ix 689 15.6 Confidence Limits on Estimated Model Parameters 15.7 Robust Estimation 699 16 Integration of Ordinary Differential Equations 707 16.0 Introduction 707 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 16.1 Runge-Kutta Method 710 16.2 Adaptive Stepsize Control for Runge-Kutta 714 722 16.3 Modified Midpoint Method 16.4 Richardson Extrapolation and the Bulirsch-Stoer Method 724 16.5 Second-Order Conservative Equations 732 16.6 Stiff Sets of Equations 734 16.7 Multistep, Multivalue, and Predictor-Corrector Methods 747 17 Two Point Boundary Value Problems 753 17.0 Introduction 753 17.1 The Shooting Method 757 17.2 Shooting to a Fitting Point 760 17.3 Relaxation Methods 762 772 17.4 A Worked Example: Spheroidal Harmonics 17.5 Automated Allocation of Mesh Points 783 784 17.6 Handling Internal Boundary Conditions or Singular Points 18 Integral Equations and Inverse Theory 788 18.0 Introduction 788 791 18.1 Fredholm Equations of the Second Kind 18.2 Volterra Equations 794 18.3 Integral Equations with Singular Kernels 797 18.4 Inverse Problems and the Use of A Priori Information 804 18.5 Linear Regularization Methods 808 18.6 Backus-Gilbert Method 815 18.7 Maximum Entropy Image Restoration 818 19 Partial Differential Equations 827 19.0 Introduction 827 834 19.1 Flux-Conservative Initial Value Problems 19.2 Diffusive Initial Value Problems 847 853 19.3 Initial Value Problems in Multidimensions g of machine- isit website 19.4 Fourier and Cyclic Reduction Methods for Boundary ica). Value Problems 857 19.5 Relaxation Methods for Boundary Value Problems 863 19.6 Multigrid Methods for Boundary Value Problems 871 20 Less-Numerical Algorithms 889 20.0 Introduction 889 20.1 Diagnosing Machine Parameters 889 20.2 Gray Codes 894

8 x Contents 896 20.3 Cyclic Redundancy and Other Checksums 903 20.4 Huffman Coding and Compression of Data 910 20.5 Arithmetic Coding 915 20.6 Arithmetic at Arbitrary Precision 926 References http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Appendix A: Table of Prototype Declarations 930 940 Appendix B: Utility Routines Appendix C: Complex Arithmetic 948 951 Index of Programs and Dependencies General Index 965 g of machine- isit website ica).

9 Preface to the Second Edition Our aim in writing the original edition of Numerical Recipes was to provide a book that combined general discussion, analytical mathematics, algorithmics, and Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer actual working programs. The success of the first edition puts us now in a difficult, though hardly unenviable, position. We wanted, then and now, to write a book that is informal, fearlessly editorial, unesoteric, and above all useful. There is a danger that, if we are not careful, we might produce a second edition that is weighty, balanced, scholarly, and boring. It is a mixed blessing that we know more now than we did six years ago. Then, we were making educated guesses, based on existing literature and our own research, about which numerical techniques were the most important and robust. Now, we have the benefit of direct feedback from a large reader community. Letters to our alter-ego enterprise, Numerical Recipes Software, are in the thousands per year. (Please, don’t telephone us.) Our post office box has become a magnet for letters pointing out that we have omitted some particular technique, well known to be important in a particular field of science or engineering. We value such letters, and digest them carefully, especially when they point us to specific references in the literature. The inevitable result of this input is that this Second Edition of Numerical Recipes is substantially larger than its predecessor, in fact about 50% larger both in words and number of included programs (the latter now numbering well over 300). “Don’t let the book grow in size,” is the advice that we received from several wise colleagues. We have tried to follow the intended spirit of that advice, even as we violate the letter of it. We have not lengthened, or increased in difficulty, the book’s principal discussions of mainstream topics. Many new topics are presented at this same accessible level. Some topics, both from the earlier edition and new to this one, are now set in smaller type that labels them as being “advanced.” The reader who ignores such advanced sections completely will not, we think, find any lack of continuity in the shorter volume that results. Here are some highlights of the new material in this Second Edition: • a new chapter on integral equations and inverse methods • a detailed treatment of multigrid methods for solving elliptic partial differential equations • routines for band diagonal linear systems • improved routines for linear algebra on sparse matrices • Cholesky and QR decomposition • orthogonal polynomials and Gaussian quadratures for arbitrary weight g of machine- functions isit website • methods for calculating numerical derivatives ica). ́ • e approximants, and rational Chebyshev approximation Pad • Bessel functions, and modified Bessel functions, of fractional order; and several other new special functions • improved random number routines • quasi-random sequences • routines for adaptive and recursive Monte Carlo integration in high- dimensional spaces • globally convergent methods for sets of nonlinear equations xi

10 xii Preface to the Second Edition • simulated annealing minimization for continuous control spaces fast Fourier transform (FFT) for real data in two and three dimensions • fast Fourier transform (FFT) using external storage • improved fast cosine transform routines • • wavelet transforms • Fourier integrals with upper and lower limits readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer • spectral analysis on unevenly sampled data • Savitzky-Golay smoothing filters • fitting straight line data with errors in both coordinates • a two-dimensional Kolmogorov-Smirnoff test • the statistical bootstrap method • embedded Runge-Kutta-Fehlberg methods for differential equations high-order methods for stiff differential equations • a new chapter on “less-numerical” algorithms, including Huffman and • arithmetic coding, arbitrary precision arithmetic, and several other topics. Consult the Preface to the First Edition, following, or the Table of Contents, for a list of the more “basic” subjects treated. Acknowledgments It is not possible for us to list by name here all the readers who have made useful suggestions; we are grateful for these. In the text, we attempt to give specific attribution for ideas that appear to be original, and not known in the literature. We apologize in advance for any omissions. Some readers and colleagues have been particularly generous in providing us with ideas, comments, suggestions, and programs for this Second Edition. We especially want to thank George Rybicki, Philip Pinto, Peter Lepage, Robert Lupton, Douglas Eardley, Ramesh Narayan, David Spergel, Alan Oppenheim, Sallie Baliunas, Scott Tremaine, Glennys Farrar, Steven Block, John Peacock, Thomas Loredo, Matthew Choptuik, Gregory Cook, L. Samuel Finn, P. Deuflhard, Harold Lewis, Peter Weinberger, David Syer, Richard Ferch, Steven Ebstein, Bradley Keister, and William Gould. We have been helped by Nancy Lee Snyder’s mastery X manuscript. We express appreciation to our editors Lauren of a complicated T E Cowles and Alan Harvey at Cambridge University Press, and to our production editor Russell Hahn. We remain, of course, grateful to the individuals acknowledged in the Preface to the First Edition. Special acknowledgment is due to programming consultant Seth Finkelstein, who wrote, rewrote, or influenced many of the routines in this book, as well as in g of machine- its FORTRAN -language twin and the companion Example books. Our project has isit website benefited enormously from Seth’s talent for detecting, and following the trail of, even ica). very slight anomalies (often compiler bugs, but occasionally our errors), and from Numerical Recipes his good programming sense. To the extent that this edition of in C has a more graceful and “C-like” programming style than its predecessor, most of the credit goes to Seth. (Of course, we accept the blame for the FORTRANish lapses that still remain.) We prepared this book for publication on DEC and Sun workstations run- ning the UNIX operating system, and on a 486/33 PC compatible running MS-DOS 5.0/Windows 3.0. (See § 1.0 for a list of additional computers used in

11 Preface to the Second Edition xiii program tests.) We enthusiastically recommend the principal software used: GNU X, Perl, Adobe Illustrator, and PostScript. Also used were a variety of C Emacs, T E compilers – too numerous (and sometimes too buggy) for individual acknowledg- ment. It is a sobering fact that our standard test suite (exercising all the routines in this book) has uncovered compiler bugs in many of the compilers tried. When possible, we work with developers to see that such bugs get fixed; we encourage readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) interested compiler developers to contact us about such arrangements. WHP and SAT acknowledge the continued support of the U.S. National Science Foundation for their research on computational methods. D.A.R.P.A. support is acknowledged for § 13.10 on wavelets. June, 1992 William H. Press Saul A. Teukolsky William T. Vetterling Brian P. Flannery g of machine- isit website ica).

12 Preface to the First Edition for several reasons. In one sense, this book We call this book Numerical Recipes http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin is indeed a “cookbook” on numerical computation. However there is an important distinction between a cookbook and a restaurant menu. The latter presents choices among complete dishes in each of which the individual flavors are blended and disguised. The former — and this book — reveals the individual ingredients and explains how they are prepared and combined. Another purpose of the title is to connote an eclectic mixture of presentational techniques. This book is unique, we think, in offering, for each topic considered, a certain amount of general discussion, a certain amount of analytical mathematics, a certain amount of discussion of algorithmics, and (most important) actual imple- mentations of these ideas in the form of working computer routines. Our task has been to find the right balance among these ingredients for each topic. You will find that for some topics we have tilted quite far to the analytic side; this where we have felt there to be gaps in the “standard” mathematical training. For other topics, where the mathematical prerequisites are universally held, we have tilted towards more in-depth discussion of the nature of the computational algorithms, or towards practical questions of implementation. We admit, therefore, to some unevenness in the “level” of this book. About half of it is suitable for an advanced undergraduate course on numerical computation for science or engineering majors. The other half ranges from the level of a graduate course to that of a professional reference. Most cookbooks have, after all, recipes at varying levels of complexity. An attractive feature of this approach, we think, is that the reader can use the book at increasing levels of sophistication as his/her experience grows. Even inexperienced readers should be able to use our most advanced routines as black boxes. Having done so, we hope that these readers will subsequently go back and learn what secrets are inside. If there is a single dominant theme in this book, it is that practical methods of numerical computation can be simultaneously efficient, clever, and — important — clear. The alternative viewpoint, that efficient computational methods must necessarily be so arcane and complex as to be useful only in “black box” form, we firmly reject. Our purpose in this book is thus to open up a large number of computational black boxes to your scrutiny. We want to teach you to take apart these black boxes and to put them back together again, modifying them to suit your specific needs. g of machine- We assume that you are mathematically literate, i.e., that you have the normal isit website mathematical preparation associated with an undergraduate degree in a physical ica). science, or engineering, or economics, or a quantitative social science. We assume that you know how to program a computer. We do not assume that you have any prior formal knowledge of numerical analysis or numerical methods. The scope of Numerical Recipes is supposed to be “everything up to, but not including, partial differential equations.” We honor this in the breach: First, we do have one introductory chapter on methods for partial differential equations (Chapter 19). Second, we obviously cannot include everything else. All the so-called “standard” topics of a numerical analysis course have been included in this book: xiv

13 Preface to the First Edition xv linear equations (Chapter 2), interpolation and extrapolation (Chaper 3), integration (Chaper 4), nonlinear root-finding (Chapter 9), eigensystems (Chapter 11), and ordinary differential equations (Chapter 16). Most of these topics have been taken beyond their standard treatments into some advanced material which we have felt to be particularly important or useful. Some other subjects that we cover in detail are not usually found in the standard http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v numerical analysis texts. These include the evaluation of functions and of particular special functions of higher mathematics (Chapters 5 and 6); random numbers and Monte Carlo methods (Chapter 7); sorting (Chapter 8); optimization, including multidimensional methods (Chapter 10); Fourier transform methods, including FFT methods and other spectral methods (Chapters 12 and 13); two chapters on the statistical description and modeling of data (Chapters 14 and 15); and two-point boundary value problems, both shooting and relaxation methods (Chapter 17). The programs in this book are included in ANSI-standard C . Versions of the are available separately. We have more , and BASIC book in FORTRAN , Pascal to say about the language, and the computational environment assumed by our C routines, in § 1.1 (Introduction). Acknowledgments Many colleagues have been generous in giving us the benefit of their numerical and computational experience, in providing us with programs, in commenting on the manuscript, or in general encouragement. We particularly wish to thank George Rybicki, Douglas Eardley, Philip Marcus, Stuart Shapiro, Paul Horowitz, Bruce Musicus, Irwin Shapiro, Stephen Wolfram, Henry Abarbanel, Larry Smarr, Richard Muller, John Bahcall, and A.G.W. Cameron. We also wish to acknowledge two individuals whom we have never met: Forman Acton, whose 1970 textbook (New York: Harper and Numerical Methods that Work Row) has surely left its stylistic mark on us; and Donald Knuth, both for his series of books on The Art of Computer Programming (Reading, MA: Addison-Wesley), and for T X, the computer typesetting language which immensely aided production E of this book. Research by the authors on computational methods was supported in part by the U.S. National Science Foundation. William H. Press October, 1985 Brian P. Flannery Saul A. Teukolsky William T. Vetterling g of machine- isit website ica).

14 License Information Read this section if you want to use the programs in this book on a computer. You’ll need to read the following Disclaimer of Warranty, get the programs onto your computer, and acquire a Numerical Recipes software license. (Without this license, Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin which can be the free “immediate license” under terms described below, the book is intended as a text and reference book, for reading purposes only.) Disclaimer of Warranty We make no warranties, express or implied, that the programs contained in this volume are free of error, or are consistent with any particular standard of merchantability, or that they will meet your requirements for any particular application. They should not be relied on for solving a problem whose incorrect solution could result in injury to a person or loss of property. If you do use the programs in such a manner, it is at your own risk. The authors and publisher disclaim all liability for direct or consequential damages resulting from your use of the programs. How to Get the Code onto Your Computer Pick one of the following methods: • You can type the programs from this book directly into your computer. In this case, the only kind of license available to you is the free “immediate license” (see below). You are not authorized to transfer or distribute a machine-readable copy to any other person, nor to have any other person type the programs into a computer on your behalf. We do not want to hear bug reports from you if you reported bugs choose this option, because experience has shown that virtually all in such cases are typing errors! • You can download the Numerical Recipes programs electronically from the Numerical Recipes On-Line Software Store, located at http://www.nr.com , our Web site. All the files (Recipes and demonstration programs) are packaged as a single compressed file. You’ll need to purchase a license to download and unpack them. Any number of single-screen licenses can be purchased instantly (with discount for multiple screens) from the On-Line Store, with fees that depend on your operating system (Windows or Macintosh versus Linux or UNIX) and whether you are affiliated with an educational institution. Purchasing a single- screen license is also the way to start if you want to acquire a more general (site or corporate) license; your single-screen cost will be subtracted from the cost of any later license upgrade. g of machine- • You can purchase media containing the programs from Cambridge University Press. isit website A CD-ROM version in ISO-9660 format for Windows and Macintosh systems ica). contains the complete C software, and also the C++ version. More extensive CD- ROMs in ISO-9660 format for Windows, Macintosh, and UNIX/Linux systems are also available; these include the C, C++, and Fortran versions on a single CD-ROM (as well as versions in Pascal and BASIC from the first edition). These CD-ROMs are available with a single-screen license for Windows or Macintosh (order ISBN 0 521 750350), or (at a slightly higher price) with a single-screen license for UNIX/Linux workstations (order ISBN 0 521 750369). Orders for media from Cambridge University Press can be placed at 800 872-7423 (North America only) or by email to [email protected] (North America) or [email protected] (rest of world). Or, visit the Web site http://www.cambridge.org . xvi

15 License Information xvii Types of License Offered Note that some types are Here are the types of licenses that we offer. automatically acquired with the purchase of media from Cambridge University Press, or of an unlocking password from the Numerical Recipes On-Line Software Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Store, while other types of licenses require that you communicate specifically with Numerical Recipes Software (email: [email protected] or fax: 781 863-1739). Our Web site has additional information. http://www.nr.com • [“Immediate License”] If you are the individual owner of a copy of this book and you type one or more of its routines into your computer, we authorize you to use them on that computer for your own personal and noncommercial purposes. You are not authorized to transfer or distribute machine-readable copies to any other person, or to use the routines on more than one machine, or to distribute executable programs containing our routines. This is the only free license. • [“Single-Screen License”] This is the most common type of low-cost license, with terms governed by our Single Screen (Shrinkwrap) License document (complete terms available through our Web site). Basically, this license lets you use Numerical Recipes routines on any one screen (PC, workstation, X-terminal, etc.). You may also, under this license, transfer pre-compiled, executable programs incorporating our routines to other, unlicensed, screens or computers, providing that (i) your application is noncommercial (i.e., does not involve the selling of your program for a fee), (ii) the programs were first developed, compiled, and successfully run on a licensed screen, and (iii) our routines are bound into the programs in such a manner that they cannot be accessed as individual routines and cannot practicably be unbound and used in other programs. That is, under this license, your program user must not be able to use our programs as part of a program library or “mix-and- match” workbench. Conditions for other types of commercial or noncommercial distribution may be found on our Web site ( http://www.nr.com ). • [“Multi-Screen, Server, Site, and Corporate Licenses”] The terms of the Single Screen License can be extended to designated groups of machines, defined by number of screens, number of machines, locations, or ownership. Significant discounts from the corresponding single-screen prices are available when the estimated number of screens exceeds 40. Contact Numerical Recipes Software (email: [email protected] or fax: 781 863-1739) for details. • [“Course Right-to-Copy License”] Instructors at accredited educational institutions who have adopted this book for a course, and who have already purchased a Single Screen License (either acquired with the purchase of media, or from the Numerical Recipes On-Line Software Store), may license the programs for use in that course as follows: Mail your name, title, and address; the course name, number, dates, and estimated enrollment; and advance payment of $ 5 per (estimated) student to Numerical Recipes Software, at this address: P.O. Box 243, Cambridge, MA g of machine- 02238 (USA). You will receive by return mail a license authorizing you to make isit website copies of the programs for use by your students, and/or to transfer the programs to ica). a machine accessible to your students (but only for the duration of the course). About Copyrights on Computer Programs Like artistic or literary compositions, computer programs are protected by copyright. Generally it is an infringement for you to copy into your computer a program from a copyrighted source. (It is also not a friendly thing to do, since it deprives the program’s author of compensation for his or her creative effort.) Under

16 xviii License Information copyright law, all “derivative works” (modified versions, or translations into another computer language) also come under the same copyright as the original work. Copyright does not protect ideas, but only the expression of those ideas in a particular form. In the case of a computer program, the ideas consist of the program’s methodology and algorithm, including the necessary sequence of steps adopted by the programmer. The expression of those ideas is the program source Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v code (particularly any arbitrary or stylistic choices embodied in it), its derived object code, and any other derivative works. If you analyze the ideas contained in a program, and then express those ideas in your own completely different implementation, then that new program implementation belongs to you. That is what we have done for those programs in this book that are not entirely of our own devising. When programs in this book are said to be “based” on programs published in copyright sources, we mean that the ideas are the same. The expression of these ideas as source code is our own. We believe that no material in this book infringes on an existing copyright. Trademarks Several registered trademarks appear within the text of this book: Sun is a trademark of Sun Microsystems, Inc. SPARC and SPARCstation are trademarks of SPARC International, Inc. Microsoft, Windows 95, Windows NT, PowerStation, and MS are trademarks of Microsoft Corporation. DEC, VMS, Alpha AXP, and ULTRIX are trademarks of Digital Equipment Corporation. IBM is a trademark of International Business Machines Corporation. Apple and Macintosh are trademarks of Apple Computer, Inc. UNIX is a trademark licensed exclusively through X/Open Co. Ltd. IMSL is a trademark of Visual Numerics, Inc. NAG refers to proprietary computer software of Numerical Algorithms Group (USA) Inc. PostScript and Adobe Illustrator are trademarks of Adobe Systems Incorporated. Last, and no doubt least, Numerical Recipes (when identifying products) is a trademark of Numerical Recipes Software. Attributions The fact that ideas are legally “free as air” in no way supersedes the ethical requirement that ideas be credited to their known originators. When programs in this book are based on known sources, whether copyrighted or in the public domain, published or “handed-down,” we have attempted to give proper attribution. Unfor- tunately, the lineage of many programs in common circulation is often unclear. We would be grateful to readers for new or corrected information regarding attributions, g of machine- isit website which we will attempt to incorporate in subsequent printings. ica).

17 Computer Programs by Chapter and Section 1.0 flmoon calculate phases of the moon by date 1.1 julday Julian Day number from calendar date Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v badluk 1.1 Friday the 13th when the moon is full caldat calendar date from Julian day number 1.1 gaussj 2.1 Gauss-Jordan matrix inversion and linear equation solution ludcmp linear equation solution, LU 2.3 decomposition 2.3 lubksb linear equation solution, backsubstitution 2.4 tridag solution of tridiagonal systems 2.4 banmul multiply vector by band diagonal matrix 2.4 band diagonal systems, decomposition bandec 2.4 banbks band diagonal systems, backsubstitution 2.5 mprove linear equation solution, iterative improvement svbksb singular value backsubstitution 2.6 svdcmp singular value decomposition of a matrix 2.6 2 2 / 2 1 2.6 pythag calculate ( a ) b + without overflow 2.7 cyclic solution of cyclic tridiagonal systems 2.7 sprsin convert matrix to sparse format sprsax 2.7 product of sparse matrix and vector 2.7 product of transpose sparse matrix and vector sprstx 2.7 sprstp transpose of sparse matrix sprspm pattern multiply two sparse matrices 2.7 2.7 threshold multiply two sparse matrices sprstm 2.7 linbcg biconjugate gradient solution of sparse systems 2.7 snrm used by linbcg for vector norm for sparse multiplication 2.7 atimes used by linbcg 2.7 asolve used by linbcg for preconditioner 2.8 vander solve Vandermonde systems solve Toeplitz systems toeplz 2.8 choldc Cholesky decomposition 2.9 Cholesky backsubstitution 2.9 cholsl 2.10 QR decomposition qrdcmp QR backsubstitution 2.10 qrsolv g of machine- isit website 2.10 rsolv right triangular backsubstitution ica). 2.10 update a QR decomposition qrupdt 2.10 rotate Jacobi rotation used by qrupdt 3.1 polint polynomial interpolation 3.2 ratint rational function interpolation 3.3 spline construct a cubic spline 3.3 splint cubic spline interpolation 3.4 locate search an ordered table by bisection xix

18 xx Computer Programs by Chapter and Section search a table when calls are correlated 3.4 hunt 3.5 polcoe polynomial coefficients from table of values polynomial coefficients from table of values 3.5 polcof polin2 3.6 two-dimensional polynomial interpolation 3.6 bcucof construct two-dimensional bicubic 3.6 bcuint two-dimensional bicubic interpolation http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v splie2 construct two-dimensional spline 3.6 3.6 splin2 two-dimensional spline interpolation 4.2 trapzd trapezoidal rule 4.2 qtrap integrate using trapezoidal rule 4.2 qsimp integrate using Simpson’s rule qromb 4.3 integrate using Romberg adaptive method midpnt extended midpoint rule 4.4 4.4 integrate using open Romberg adaptive method qromo 4.4 midinf integrate a function on a semi-infinite interval 4.4 midsql integrate a function with lower square-root singularity 4.4 integrate a function with upper square-root singularity midsqu 4.4 midexp integrate a function that decreases exponentially 4.5 qgaus integrate a function by Gaussian quadratures 4.5 gauleg Gauss-Legendre weights and abscissas 4.5 gaulag Gauss-Laguerre weights and abscissas gauher Gauss-Hermite weights and abscissas 4.5 4.5 Gauss-Jacobi weights and abscissas gaujac 4.5 gaucof quadrature weights from orthogonal polynomials orthog construct nonclassical orthogonal polynomials 4.5 4.6 integrate a function over a three-dimensional space quad3d 5.1 eulsum sum a series by Euler–van Wijngaarden algorithm 5.3 ddpoly evaluate a polynomial and its derivatives 5.3 poldiv divide one polynomial by another 5.3 ratval evaluate a rational function 5.7 dfridr numerical derivative by Ridders’ method 5.8 chebft fit a Chebyshev polynomial to a function 5.8 chebev Chebyshev polynomial evaluation derivative of a function already Chebyshev fitted 5.9 chder 5.9 chint integrate a function already Chebyshev fitted 5.10 chebpc polynomial coefficients from a Chebyshev fit g of machine- isit website 5.10 pcshft polynomial coefficients of a shifted polynomial ica). 5.11 inverse of chebpc ; use to economize power series pccheb ́ 5.12 pade Pad e approximant from power series coefficients 5.13 ratlsq rational fit by least-squares method 6.1 logarithm of gamma function gammln 6.1 factrl factorial function 6.1 bico binomial coefficients function 6.1 factln logarithm of factorial function

19 Computer Programs by Chapter and Section xxi 6.1 beta beta function 6.2 gammp incomplete gamma function 6.2 gammq complement of incomplete gamma function 6.2 gser series used by gammp and gammq gammp gammq gcf and 6.2 continued fraction used by 6.2 erff error function Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer erffc 6.2 complementary error function erfcc complementary error function, concise routine 6.2 expint 6.3 E exponential integral n 6.3 ei exponential integral Ei 6.4 betai incomplete beta function 6.4 betacf betai continued fraction used by bessj0 J 6.5 Bessel function 0 bessy0 Bessel function 6.5 Y 0 6.5 bessj1 Bessel function J 1 6.5 bessy1 Bessel function Y 1 bessy Bessel function Y of general integer order 6.5 6.5 bessj Bessel function J of general integer order 6.6 bessi0 modified Bessel function I 0 6.6 bessk0 modified Bessel function K 0 I bessi1 modified Bessel function 6.6 1 modified Bessel function K bessk1 6.6 1 bessk modified Bessel function K of integer order 6.6 bessi modified Bessel function I of integer order 6.6 6.7 bessjy Bessel functions of fractional order 6.7 beschb Chebyshev expansion used by bessjy 6.7 modified Bessel functions of fractional order bessik 6.7 airy Airy functions 6.7 sphbes spherical Bessel functions j and y n n 6.8 plgndr Legendre polynomials, associated (spherical harmonics) and ) 6.9 frenel Fresnel integrals S ( x ) x C ( cisi cosine and sine integrals Ci and Si 6.9 6.10 dawson Dawson’s integral 6.11 rf Carlson’s elliptic integral of the first kind rd 6.11 Carlson’s elliptic integral of the second kind 6.11 rj Carlson’s elliptic integral of the third kind 6.11 rc Carlson’s degenerate elliptic integral 6.11 ellf Legendre elliptic integral of the first kind g of machine- isit website 6.11 elle Legendre elliptic integral of the second kind ica). 6.11 Legendre elliptic integral of the third kind ellpi 6.11 sncndn Jacobian elliptic functions 6.12 hypgeo complex hypergeometric function 6.12 hypser complex hypergeometric function, series evaluation 6.12 hypdrv complex hypergeometric function, derivative of 7.1 ran0 random deviate by Park and Miller minimal standard 7.1 ran1 random deviate, minimal standard plus shuffle

20 xxii Computer Programs by Chapter and Section random deviate by L’Ecuyer long period plus shuffle 7.1 ran2 7.1 ran3 random deviate by Knuth subtractive method exponential random deviates 7.2 expdev gasdev 7.2 normally distributed random deviates 7.3 gamdev gamma-law distribution random deviates 7.3 poidev Poisson distributed random deviates http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v bnldev binomial distributed random deviates 7.3 7.4 irbit1 random bit sequence 7.4 irbit2 random bit sequence 7.5 psdes “pseudo-DES” hashing of 64 bits 7.5 ran4 random deviates from DES-like hashing sobseq 7.7 Sobol’s quasi-random sequence 7.8 vegas adaptive multidimensional Monte Carlo integration 7.8 sample rebinning used by vegas rebin 7.8 miser recursive multidimensional Monte Carlo integration 7.8 ranpt get random point, used by miser 8.1 sort an array by straight insertion piksrt 8.1 piksr2 sort two arrays by straight insertion 8.1 shell sort an array by Shell’s method 8.2 sort sort an array by quicksort method 8.2 sort2 sort two arrays by quicksort method 8.3 hpsort sort an array by heapsort method 8.4 construct an index for an array indexx 8.4 sort3 sort, use an index to sort 3 or more arrays rank construct a rank table for an array 8.4 select find the 8.5 th largest in an array N 8.5 selip find the N th largest, without altering an array 8.5 hpsel find M largest values, without altering an array 8.6 eclass determine equivalence classes from list 8.6 eclazz determine equivalence classes from procedure 9.0 scrsho graph a function to search for roots outward search for brackets on roots zbrac 9.1 zbrak inward search for brackets on roots 9.1 find root of a function by bisection 9.1 rtbis 9.2 find root of a function by false-position rtflsp find root of a function by secant method 9.2 rtsec g of machine- isit website 9.2 zriddr find root of a function by Ridders’ method ica). 9.3 find root of a function by Brent’s method zbrent 9.4 rtnewt find root of a function by Newton-Raphson 9.4 rtsafe find root of a function by Newton-Raphson and bisection 9.5 laguer find a root of a polynomial by Laguerre’s method 9.5 zroots roots of a polynomial by Laguerre’s method with deflation 9.5 zrhqr roots of a polynomial by eigenvalue methods 9.5 qroot complex or double root of a polynomial, Bairstow

21 Computer Programs by Chapter and Section xxiii 9.6 mnewt Newton’s method for systems of equations 9.7 lnsrch search along a line, used by newt 9.7 newt globally convergent multi-dimensional Newton’s method 9.7 fdjac finite-difference Jacobian, used by newt fmin 9.7 newt norm of a vector function, used by 9.7 broydn secant method for systems of equations Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer mnbrak 10.1 bracket the minimum of a function golden find minimum of a function by golden section search 10.1 10.2 find minimum of a function by Brent’s method brent 10.3 dbrent find minimum of a function using derivative information 10.4 amoeba minimize in N -dimensions by downhill simplex method 10.4 amotry amoeba evaluate a trial point, used by powell minimize in N 10.5 -dimensions by Powell’s method 10.5 linmin minimum of a function along a ray in N -dimensions 10.5 f1dim function used by linmin 10.6 minimize in N -dimensions by conjugate gradient frprmn 10.6 dlinmin minimum of a function along a ray using derivatives 10.6 df1dim function used by dlinmin 10.7 dfpmin minimize in N -dimensions by variable metric method linear programming maximization of a linear function 10.8 simplx 10.8 simplx linear programming, used by simp1 10.8 simp2 linear programming, used by simplx simp3 linear programming, used by simplx 10.8 10.9 traveling salesman problem by simulated annealing anneal 10.9 revcst cost of a reversal, used by anneal 10.9 reverse do a reversal, used by anneal 10.9 trncst cost of a transposition, used by anneal do a transposition, used by 10.9 trnspt anneal 10.9 metrop Metropolis algorithm, used by anneal 10.9 amebsa simulated annealing in continuous spaces amotsa amebsa 10.9 evaluate a trial point, used by jacobi eigenvalues and eigenvectors of a symmetric matrix 11.1 11.1 eigsrt eigenvectors, sorts into order by eigenvalue tred2 11.2 Householder reduction of a real, symmetric matrix 11.3 tqli eigensolution of a symmetric tridiagonal matrix 11.5 balanc balance a nonsymmetric matrix g of machine- isit website 11.5 elmhes reduce a general matrix to Hessenberg form ica). 11.6 eigenvalues of a Hessenberg matrix hqr 12.2 four1 fast Fourier transform (FFT) in one dimension 12.3 twofft fast Fourier transform of two real functions 12.3 realft fast Fourier transform of a single real function 12.3 sinft fast sine transform 12.3 cosft1 fast cosine transform with endpoints 12.3 cosft2 “staggered” fast cosine transform

22 xxiv Computer Programs by Chapter and Section 12.4 fourn fast Fourier transform in multidimensions 12.5 rlft3 FFT of real data in two or three dimensions 12.6 fourfs FFT for huge data sets on external media fourew rewind and permute files, used by 12.6 fourfs 13.1 convlv convolution or deconvolution of data using FFT http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v correl 13.2 correlation or autocorrelation of data using FFT spctrm power spectrum estimation using FFT 13.4 13.6 evaluate maximum entropy (MEM) coefficients memcof 13.6 fixrts reflect roots of a polynomial into unit circle 13.6 predic linear prediction using MEM coefficients 13.7 evlmem power spectral estimation from MEM coefficients period 13.8 power spectrum of unevenly sampled data 13.8 fasper power spectrum of unevenly sampled larger data sets 13.8 extirpolate value into array, used by fasper spread 13.9 dftcor compute endpoint corrections for Fourier integrals 13.9 dftint high-accuracy Fourier integrals 13.10 one-dimensional discrete wavelet transform wt1 13.10 daub4 Daubechies 4-coefficient wavelet filter 13.10 pwtset initialize coefficients for pwt 13.10 pwt partial wavelet transform 13.10 wtn multidimensional discrete wavelet transform calculate moments of a data set moment 14.1 14.2 t -test for difference of means Student’s ttest avevar calculate mean and variance of a data set 14.2 tutest Student’s t 14.2 -test for means, case of unequal variances 14.2 tptest Student’s t -test for means, case of paired data 14.2 ftest F -test for difference of variances 14.3 chsone chi-square test for difference between data and model 14.3 chstwo chi-square test for difference between two data sets 14.3 ksone Kolmogorov-Smirnov test of data against model 14.3 kstwo Kolmogorov-Smirnov test between two data sets Kolmogorov-Smirnov probability function probks 14.3 cntab1 contingency table analysis using chi-square 14.4 contingency table analysis using entropy measure 14.4 cntab2 14.5 Pearson’s correlation between two data sets pearsn Spearman’s rank correlation between two data sets 14.6 spear g of machine- 14.6 crank replaces array elements by their rank isit website ica). 14.6 correlation between two data sets, Kendall’s tau kendl1 14.6 kendl2 contingency table analysis using Kendall’s tau 14.7 ks2d1s K–S test in two dimensions, data vs. model 14.7 quadct count points by quadrants, used by ks2d1s 14.7 quadvl quadrant probabilities, used by ks2d1s 14.7 ks2d2s K–S test in two dimensions, data vs. data 14.8 savgol Savitzky-Golay smoothing coefficients

23 Computer Programs by Chapter and Section xxv 15.2 fit least-squares fit data to a straight line y and fitexy fit data to a straight line, errors in both x 15.3 2 χ to calculate a fitexy 15.3 chixy used by 15.4 lfit general linear least-squares fit by normal equations rearrange covariance matrix, used by 15.4 covsrt lfit 15.4 svdfit linear least-squares fit by singular value decomposition Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 15.4 svdvar variances from singular value decomposition 15.4 fpoly fit a polynomial using lfit or svdfit 15.4 fleg lfit or svdfit fit a Legendre polynomial using mrqmin 15.5 nonlinear least-squares fit, Marquardt’s method mrqcof used by mrqmin 15.5 to evaluate coefficients 15.5 fgauss fit a sum of Gaussians using mrqmin 15.7 medfit fit data to a straight line robustly, least absolute deviation 15.7 fit data robustly, used by medfit rofunc 16.1 rk4 integrate one step of ODEs, fourth-order Runge-Kutta 16.1 rkdumb integrate ODEs by fourth-order Runge-Kutta 16.2 rkqs integrate one step of ODEs with accuracy monitoring 16.2 rkck Cash-Karp-Runge-Kutta step used by rkqs 16.2 odeint integrate ODEs with accuracy monitoring integrate ODEs by modified midpoint method 16.3 mmid 16.4 integrate ODEs, Bulirsch-Stoer step bsstep 16.4 pzextr polynomial extrapolation, used by bsstep rzextr rational function extrapolation, used by bsstep 16.4 stoerm integrate conservative second-order ODEs 16.5 stiff integrate stiff ODEs by fourth-order Rosenbrock 16.6 16.6 jacobn sample Jacobian routine for stiff 16.6 derivs sample derivatives routine for stiff 16.6 simpr integrate stiff ODEs by semi-implicit midpoint rule stifbs 16.6 integrate stiff ODEs, Bulirsch-Stoer step 17.1 shoot solve two point boundary value problem by shooting 17.2 shootf ditto, by shooting to a fitting point 17.3 solvde two point boundary value problem, solve by relaxation 17.3 solvde backsubstitution, used by bksub pinvs diagonalize a sub-block, used by solvde 17.3 red 17.3 solvde reduce columns of a matrix, used by 17.4 sfroid spheroidal functions by method of solvde g of machine- isit website 17.4 difeq spheroidal matrix coefficients, used by sfroid ica). 17.4 sphoot shoot spheroidal functions by method of sphfpt 17.4 shootf spheroidal functions by method of 18.1 fred2 solve linear Fredholm equations of the second kind 18.1 fredin interpolate solutions obtained with fred2 18.2 voltra linear Volterra equations of the second kind 18.3 wwghts quadrature weights for an arbitrarily singular kernel 18.3 kermom sample routine for moments of a singular kernel

24 xxvi Computer Programs by Chapter and Section sample routine for a quadrature matrix 18.3 quadmx 18.3 fredex example of solving a singular Fredholm equation elliptic PDE solved by successive overrelaxation method 19.5 sor linear elliptic PDE solved by multigrid method 19.6 mglin , 19.6 rstrct half-weighting restriction, used by mglin mgfas Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 19.6 interp bilinear prolongation, used by mglin , mgfas 19.6 addint interpolate and add, used by mglin 19.6 slvsml mglin solve on coarsest grid, used by relax Gauss-Seidel relaxation, used by 19.6 mglin 19.6 resid calculate residual, used by mglin 19.6 copy utility used by mglin , mgfas 19.6 utility used by mglin fill0 19.6 mgfas nonlinear elliptic PDE solved by multigrid method 19.6 relax2 Gauss-Seidel relaxation, used by mgfas 19.6 slvsm2 solve on coarsest grid, used by mgfas mgfas lop applies nonlinear operator, used by 19.6 19.6 mgfas utility used by matadd 19.6 matsub utility used by mgfas anorm2 utility used by mgfas 19.6 machar diagnose computer’s floating arithmetic 20.1 20.2 igray Gray code and its inverse 20.3 cyclic redundancy checksum, used by icrc icrc1 20.3 icrc cyclic redundancy checksum 20.3 decchk decimal check digit calculation or verification 20.4 hufmak construct a Huffman code append bits to a Huffman code, used by 20.4 hufapp hufmak use Huffman code to encode and compress a character 20.4 hufenc 20.4 use Huffman code to decode and decompress a character hufdec 20.5 arcmak construct an arithmetic code 20.5 arcode encode or decode a character using arithmetic coding 20.5 arcsum add integer to byte string, used by arcode 20.6 mpops multiple precision arithmetic, simpler operations mpmul multiple precision multiply, using FFT methods 20.6 multiple precision reciprocal 20.6 mpinv 20.6 mpdiv multiple precision divide and remainder multiple precision square root 20.6 mpsqrt g of machine- isit website 20.6 mp2dfr multiple precision conversion to decimal base ica). 20.6 mppi multiple precision example, compute many digits of π

25 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Preliminaries Chapter 1. 1.0 Introduction This book, like its predecessor edition, is supposed to teach you methods of numerical computing that are practical, efficient, and (insofar as possible) elegant. We presume throughout this book that you, the reader, have particular tasks that you want to get done. We view our job as educating you on how to proceed. Occasionally we may try to reroute you briefly onto a particularly beautiful side road; but by and large, we will guide you along main highways that lead to practical destinations. Throughout this book, you will find us fearlessly editorializing, telling you what you should and shouldn’t do. This prescriptive tone results from a conscious decision on our part, and we hope that you will not find it irritating. We do not claim that our advice is infallible! Rather, we are reacting against a tendency, in the textbook literature of computation, to discuss every possible method that has ever been invented, without ever offering a practical judgment on relative merit. We do, therefore, offer you our practical judgments whenever we can. As you gain experience, you will form your own opinion of how reliable our advice is. We presume that you are able to read computer programs in C , that being the language of this version of Numerical Recipes (Second Edition). The book Numerical Recipes in FORTRAN (Second Edition) is separately available, if you prefer to program in that language. Earlier editions of Numerical Recipes in Pascal and Numerical Recipes Routines and Examples in BASIC are also available; while and C not containing the additional material of the Second Edition versions in BASIC , these versions are perfectly serviceable if Pascal or FORTRAN is your language of choice. When we include programs in the text, they look like this: #include #define RAD (3.14159265/180.0) g of machine- isit website void flmoon(int n, int nph, long *jd, float *frac) ica). Our programs begin with an introductory comment summarizing their purpose and explaining their calling sequence. This routine calculates the phases of the moon. Given an integer n and acode nph for the phase desired ( nph =0 for new moon, 1 for first quarter, 2 for full, 3 for last , and the fractional part of a day frac jd quarter), the routine returns the Julian Day Number to be added to it, of the n th such phase since January, 1900. Greenwich Mean Time is assumed. { void nrerror(char error_text[]); int i; float am,as,c,t,t2,xtra; c=n+nph/4.0; This is how we comment an individual line. 1

26 2 Preliminaries Chapter 1. t=c/1236.85; t2=t*t; You aren’t really intended to understand as=359.2242+29.105356*c; am=306.0253+385.816918*c+0.010730*t2; this algorithm, but it does work! *jd=2415020+28L*n+7L*nph; xtra=0.75933+1.53058868*c+((1.178e-4)-(1.55e-7)*t)*t2; if (nph == 0 || nph == 2) xtra += (0.1734-3.93e-4*t)*sin(RAD*as)-0.4068*sin(RAD*am); http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin else if (nph == 1 || nph == 3) xtra += (0.1721-4.0e-4*t)*sin(RAD*as)-0.6280*sin(RAD*am); else nrerror("nph is unknown in flmoon"); This is how we will indicate error conditions. i=(int)(xtra >= 0.0 ? floor(xtra) : ceil(xtra-1.0)); *jd += i; *frac=xtra-i; } If the syntax of the function definition above looks strange to you, then you are probably used to the older Kernighan and Ritchie (“K&R”) syntax, rather than that of the newer ANSI C. In this edition, we adopt ANSI C as our standard. You might want to look ahead to § 1.2 where ANSI C function prototypes are discussed in more detail. Note our convention of handling all errors and exceptional cases with a statement like nrerror("some error message"); . The function nrerror() is part of a small file of utility programs, nrutil.c , listed in Appendix B at the back of the book. This Appendix includes a number of other utilities that we will describe later in nrerror() prints the indicated error message to your stderr this chapter. Function device (usually your terminal screen), and then invokes the function , which exit() terminates execution. The function exit() is in every C library we know of; but if you find it missing, you can modify nrerror() so that it does anything else that will halt execution. For example, you can have it pause for input from the keyboard, and then manually interrupt execution. In some applications, you will want to modify nrerror() to do more sophisticated error handling, for example to transfer control somewhere else, with an error flag or error code set. We will have more to say about the C programming language, its conventions 1.2. § and style, in § 1.1 and Computational Environment and Program Validation Our goal is that the programs in this book be as portable as possible, across different platforms (models of computer), across different operating systems, and C across different compilers. C was designed with this type of portability in mind. Nevertheless, we have found that there is no substitute for actually checking g of machine- all programs on a variety of compilers, in the process uncovering differences in isit website library structure or contents, and even occasional differences in allowed syntax. As ica). surrogates for the large number of possible combinations, we have tested all the programs in this book on the combinations of machines, operating systems, and compilers shown on the accompanying table. More generally, the programs should run without modification on any compiler that implements the ANSI C standard, [1] . With small as described for example in Harbison and Steele’s excellent book modifications, our programs should run on any compiler that implements the older, [2] . An example of the kind of trivial incompatibility to de facto K&R standard watch out for is that ANSI C requires the memory allocation functions malloc()

27 1.0 Introduction 3 Tested Machines and Compilers Compiler Version Hardware O/S Version MS-DOS 5.0/Windows 3.1 IBM PC compatible 486/33 Microsoft C/C++ 7.0 IBM PC compatible 486/33 MS-DOS 5.0 Borland C/C++ 2.0 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v AIX 3.2 IBM xlc 1.02 IBM RS/6000 DECstation 5000/25 ULTRIX 4.2A CodeCenter (Saber) C 3.1.1 GNU C Compiler 2.1 DECsystem 5400 ULTRIX 4.1 Sun SPARCstation 2 GNU C Compiler 1.40 SunOS 4.1 DECstation 5000/200 ULTRIX 4.2 DEC RISC C 2.1* Sun SPARCstation 2 SunOS 4.1 Sun cc 1.1* *compiler version does not fully implement ANSI C; only K&R validated and free() to be declared via the header stdlib.h ; some older compilers require them to be declared with the header file malloc.h , while others regard them as inherent in the language and require no header file at all. In validating the programs, we have taken the program source code directly from the machine-readable form of the book’s manuscript, to decrease the chance of propagating typographical errors. “Driver” or demonstration programs that we Numerical Recipes used as part of our validations are available separately as the , as well as in machine-readable form. If you plan to use more Example Book (C) than a few of the programs in this book, or if you plan to use programs in this book on more than one different computer, then you may find it useful to obtain a copy of these demonstration programs. Of course we would be foolish to claim that there are no bugs in our programs, and we do not make such a claim. We have been very careful, and have benefitted from the experience of the many readers who have written to us. If you find a new bug, please document it and tell us! Compatibility with the First Edition Numerical Recipes If you are accustomed to the routines of the First Edition, rest assured: almost all of them are still here, with the same names and functionalities, often with major improvements in the code itself. In addition, we hope that you will soon become equally familiar with the added capabilities of the more than 100 g of machine- routines that are new to this edition. isit website We have retired a small number of First Edition routines, those that we believe ica). to be clearly dominated by better methods implemented in this edition. A table, following, lists the retired routines and suggests replacements. First Edition users should also be aware that some routines common to both editions have alterations in their calling interfaces, so are not directly “plug compat- ible.” A fairly complete list is: chsone , chstwo , covsrt , dfpmin , laguer , lfit , realft memcof mrqcof , mrqmin , pzextr , ran4 , , , rzextr , shoot , shootf . There may be others (depending in part on which printing of the First Edition is taken for the comparison). If you have written software of any appreciable complexity

28 4 Chapter 1. Preliminaries Previous Routines Omitted from This Edition Name(s) Replacement(s) Comment mgfas adi mglin or better method cosft cosft1 or cosft2 choice of boundary conditions http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin , el2 rf cel rd , rj , rc better algorithms , des desks ran4 now uses psdes was too slow , , mdian2 select , selip more general mdian1 is now sort hpsort qcksrt sort name change ( ) better method rkqc rkqs smooft use convlv with coefficients from savgol more general sparse linbcg that is dependent on First Edition routines, we do not recommend blindly replacing them by the corresponding routines in this book. We do recommend that any new programming efforts use the new routines. About References You will find references, and suggestions for further reading, listed at the end of most sections of this book. References are cited in the text by bracketed [3] . numbers like this Because computer algorithms often circulate informally for quite some time before appearing in a published form, the task of uncovering “primary literature” is sometimes quite difficult. We have not attempted this, and we do not pretend to any degree of bibliographical completeness in this book. For topics where a substantial secondary literature exists (discussion in textbooks, reviews, etc.) we have consciously limited our references to a few of the more useful secondary sources, especially those with good references to the primary literature. Where the existing secondary literature is insufficient, we give references to a few primary sources that are intended to serve as starting points for further reading, not as complete bibliographies for the field. The order in which references are listed is not necessarily significant. It reflects a compromise between listing cited references in the order cited, and listing suggestions for further reading in a roughly prioritized order, with the most useful ones first. g of machine- isit website The remaining three sections of this chapter review some basic concepts of ica). programming (control structures, etc.), discuss a set of conventions specific to C that we have adopted in this book, and introduce some fundamental concepts in numerical analysis (roundoff error, etc.). Thereafter, we plunge into the substantive material of the book. CITED REFERENCES AND FURTHER READING: Harbison, S.P., and Steele, G.L., Jr. 1991, C: A Reference Manual , 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall). [1]

29 1.1 Program Organization and Control Structures 5 (Englewood Cliffs, NJ: The C Programming Language Kernighan, B., and Ritchie, D. 1978, Prentice-Hall). [2] [Reference for K&R “traditional” C. Later editions of this book conform to the ANSI C standard.] , 2nd ed., revised and enlarged (Rich- Astronomical Formulae for Calculators Meeus, J. 1982, mond, VA: Willmann-Bell). [3] Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 1.1 Program Organization and Control Structures We sometimes like to point out the close analogies between computer programs, on the one hand, and written poetry or written musical scores, on the other. All three present themselves as visual media, symbols on a two-dimensional page or computer screen. Yet, in all three cases, the visual, two-dimensional, frozen-in-time representation communicates (or is supposed to communicate) something rather different, namely a process that unfolds in time . A poem is meant to be read; music, played; a program, executed as a sequential series of computer instructions. In all three cases, the target of the communication, in its visual form, is a human being. The goal is to transfer to him/her, as efficiently as can be accomplished, unfold in will the greatest degree of understanding, in advance, of how the process time. In poetry, this human target is the reader. In music, it is the performer. In programming, it is the program user. Now, you may object that the target of communication of a program is not a human but a computer, that the program user is only an irrelevant intermediary, a lackey who feeds the machine. This is perhaps the case in the situation where the business executive pops a diskette into a desktop computer and feeds that computer a black-box program in binary executable form. The computer, in this case, doesn’t much care whether that program was written with “good programming practice” or not. We envision, however, that you, the readers of this book, are in quite a different a program does, but also situation. You need, or want, to know not just what how it does it, so that you can tinker with it and modify it to your particular application. You need others to be able to see what you have done, so that they can criticize or code, the maintainable admire. In such cases, where the desired goal is reusable or targets of a program’s communication are surely human, not machine. One key to achieving good programming practice is to recognize that pro- g of machine- gramming, music, and poetry — all three being symbolic constructs of the human isit website brain — are naturally structured into hierarchies that have many different nested ica). levels. Sounds (phonemes) form small meaningful units (morphemes) which in turn form words; words group into phrases, which group into sentences; sentences make paragraphs, and these are organized into higher levels of meaning. Notes form musical phrases, which form themes, counterpoints, harmonies, etc.; which form movements, which form concertos, symphonies, and so on. The structure in programs is equally hierarchical. Appropriately, good program- [1-3] . At a low ming practice brings different techniques to bear on the different levels level is the ascii character set. Then, constants, identifiers, operands, operators.

30 6 Chapter 1. Preliminaries . Here, the best programming a[j+1]=b+c/3.0; Then program statements, like be clear advice is simply don’t be too tricky . You might , or (correspondingly) momentarily be proud of yourself at writing the single line k=(2-j)*(1+3*j)/2; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer , if you want to permute cyclically one of the values j =(0 into respectively 1 , 2) . You will regret it later, however, when you try to understand that k =(1 , 2 , 0) line. Better, and likely also faster, is k=j+1; if (k == 3) k=0; Many programming stylists would even argue for the ploddingly literal switch (j) { case 0: k=1; break; case 1: k=2; break; case 2: k=0; break; default: { fprintf(stderr,"unexpected value for j"); exit(1); } } on the grounds that it is both clear and additionally safeguarded from wrong assump- . Our preference among the implementations tions about the possible values of j is for the middle one. In this simple example, we have in fact traversed several levels of hierarchy: Statements frequently come in “groups” or “blocks” which make sense only taken as a whole. The middle fragment above is one example. Another is swap=a[j]; a[j]=b[j]; b[j]=swap; which makes immediate sense to any programmer as the exchange of two variables, while ans=sum=0.0; n=1; g of machine- is very likely to be an initialization of variables prior to some iterative process. This isit website level of hierarchy in a program is usually evident to the eye. It is good programming ica). practice to put in comments at this level, e.g., “initialize” or “exchange variables.” control structures . These are things like the switch The next level is that of construction in the example above, for loops, and so on. This level is sufficiently important, and relevant to the hierarchical level of the routines in this book, that we will come back to it just below. At still higher levels in the hierarchy, we have functions and modules, and the whole “global” organization of the computational task to be done. In the musical analogy, we are now at the level of movements and complete works. At these levels,

31 1.1 Program Organization and Control Structures 7 modularization and encapsulation become important programming concepts, the general idea being that program units should interact with one another only through clearly defined and narrowly circumscribed interfaces. Good modularization practice is an essential prerequisite to the success of large, complicated software projects, especially those employing the efforts of more than one programmer. It is also good practice (if not quite as essential) in the less massive programming tasks that an Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin individual scientist, or reader of this book, encounters. Some computer languages, such as Modula-2 and C++ , promote good modular- ization with higher-level language constructs absent in C . In Modula-2, for example, functions, type definitions, and data structures can be encapsulated into “modules” that communicate through declared public interfaces and whose internal workings [4] . In the C++ language, the key concept are hidden from the rest of the program is “class,” a user-definable generalization of data type that provides for data hiding, automatic initialization of data, memory management, dynamic typing, and operator + and * so as to be overloading (i.e., the user-definable extension of operators like [5] . Properly used in defining the data appropriate to operands in any particular class) structures that are passed between program units, classes can clarify and circumscribe these units’ public interfaces, reducing the chances of programming error and also allowing a considerable degree of compile-time and run-time error checking. Beyond modularization, though depending on it, lie the concepts of object- . Here a programming language, such as oriented programming C++ or Turbo Pascal [6] , allows a module’s public interface to accept redefinitions of types or actions, 5.5 and these redefinitions become shared all the way down through the module’s ). For example, a routine written to invert a matrix hierarchy (so-called polymorphism of real numbers could — dynamically, at run time — be made able to handle complex numbers by overloading complex data types and corresponding definitions of the arithmetic operations. Additional concepts of inheritance (the ability to define a data type that “inherits” all the structure of another type, plus additional structure of its (the ability to add functionality to a module without own), and object extensibility access to its source code, e.g., at run time), also come into play. We have not attempted to modularize, or make objects out of, the routines in this , does not really make book, for at least two reasons. First, the chosen language, C this possible. Second, we envision that you, the reader, might want to incorporate the algorithms in this book, a few at a time, into modules or objects with a structure of your own choosing. There does not exist, at present, a standard or accepted set of “classes” for scientific object-oriented computing. While we might have tried to invent such a set, doing so would have inevitably tied the algorithmic content of the ˆ book (which is its raison d’ etre ) to some rather specific, and perhaps haphazard, set of choices regarding class definitions. g of machine- isit website On the other hand, we are not unfriendly to the goals of modular and object- ica). oriented programming. Within the limits of , we have therefore tried to structure C our programs to be “object friendly.” That is one reason we have adopted ANSI C with its function prototyping as our default C dialect (see § 1.2). Also, within our implementation sections, we have paid particular attention to the practices of structured programming , as we now discuss.

32 8 Preliminaries Chapter 1. Control Structures An executing program unfolds in time, but not strictly in the linear order in which the statements are written. Program statements that affect the order in which statements are executed, or that affect whether statements are executed, are called . Control statements never make useful sense by themselves. They control statements Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin make sense only in the context of the groups or blocks of statements that they in turn control. If you think of those blocks as paragraphs containing sentences, then the control statements are perhaps best thought of as the indentation of the paragraph and the punctuation between the sentences, not the words within the sentences. We can now say what the goal of structured programming is. It is to make program control manifestly apparent in the visual presentation of the program .You see that this goal has nothing at all to do with how the computer sees the program. As already remarked, computers don’t care whether you use structured programming or not. Human readers, however, do care. You yourself will also care, once you discover how much easier it is to perfect and debug a well-structured program than one whose control structure is obscure. You accomplish the goals of structured programming in two complementary ways. First, you acquaint yourself with the small number of essential control structures that occur over and over again in programming, and that are therefore given convenient representations in most programming languages. You should learn to think about your programming tasks, insofar as possible, exclusively in terms of these standard control structures. In writing programs, you should get into the habit of representing these standard control structures in consistent, conventional ways. Yes, just ?” our students sometimes ask. “Doesn’t this inhibit creativity as Mozart’s creativity was inhibited by the sonata form, or Shakespeare’s by the metrical requirements of the sonnet. The point is that creativity, when it is meant to communicate, does well under the inhibitions of appropriate restrictions on format. Second, you avoid , insofar as possible, control statements whose controlled blocks or objects are difficult to discern at a glance. This means, in practice, that you ’s. must try to avoid named labels on statements and goto ’s that It is not the goto are dangerous (although they do interrupt one’s reading of a program); the named statement labels are the hazard. In fact, whenever you encounter a named statement label while reading a program, you will soon become conditioned to get a sinking feeling in the pit of your stomach. Why? Because the following questions will, by habit, immediately spring to mind: Where did control come from in a branch to this label? It could be anywhere in the routine! What circumstances resulted in a branch to this label? They could be anything! Certainty becomes uncertainty, understanding dissolves into a morass of possibilities. g of machine- isit website Some examples are now in order to make these considerations more concrete ica). (see Figure 1.1.1). Catalog of Standard Structures Iteration. In C , simple iteration is performed with a for loop, for example for (j=2;j<=1000;j++) { b[j]=a[j-1]; a[j-1]=j; }

33 1.1 Program Organization and Control Structures 9 false yes iteration while complete? condition Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v no true block block increment index FOR iteration WHILE iteration (a) (b) block block true break condition true while false condition block false g of machine- isit website ica). BREAK iteration DO WHILE iteration (d) (c) Figure 1.1.1. Standard control structures used in structured programming: (a) for iteration; (b) while structure do while iteration; (d) break iteration; (e) if structure; (f) switch iteration; (c)

34 10 Preliminaries Chapter 1. false else if else if false false if . . . condition condition condition Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer true true true block block block else block . . . IF structure (e) switch expression yes case block match? no yes break? no yes case block match? no yes break? no g of machine- isit website default block ica). SWITCH structure (f ) Figure 1.1.1. Standard control structures used in structured programming (see caption on previous page).

35 1.1 Program Organization and Control Structures 11 Notice how we always indent the block of code that is acted upon by the control structure, leaving the structure itself unindented. Notice also our habit of putting the initial curly brace on the same line as the statement, instead of on the next line. for This saves a full line of white space, and our publisher loves us for it. This structure in Algol , Pascal , IF structure. is similar to that found in C Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer and other languages, and typically looks like FORTRAN if (...) { ... } else if (...) { ... } else { ... } Since compound-statement curly braces are required only when there is more C than one statement in a block, however, ’ s if construction can be somewhat less explicit than the corresponding structure in FORTRAN or Pascal . Some care must be if clauses. For example, consider the following: exercised in constructing nested if (b > 3) if (a > 3) b += 1; else b -= 1; /* questionable! */ As judged by the indentation used on successive lines, the intent of the writer of ‘ If b is greater than 3 and a is greater than 3, then this code is the following: According to the rules increment b. If b is not greater than 3, then decrement b. ’ of C , however, the actual meaning is ‘ If b is greater than 3, then evaluate a. If a is The greater than 3, then increment b, and if a is less than or equal to 3, decrement b. ’ point is that an else clause is associated with the most recent open if statement, no matter how you lay it out on the page. Such confusions in meaning are easily resolved by the inclusion of braces. They may in some instances be technically fl uous; nevertheless, they clarify your intent and improve the program. The super above fragment should be written as if (b > 3) { if (a > 3) b += 1; } else { b-=1; } g of machine- isit website ica). Here is a working program that consists dominantly of control statements: if #include #define IGREG (15+31L*(10+12L*1582)) Gregorian Calendar adopted Oct. 15, 1582. long julday(int mm, int id, int iyyy) In this routine returns the Julian Day Number that begins at noon o fthe calendar date julday mm ,day id ,andyear iyyy , all integer variables. Positive year signifies A.D.; specified by month negative, B.C. Remember that the year after 1 B.C. was 1 A.D. { void nrerror(char error_text[]);

36 12 Chapter 1. Preliminaries long jul; int ja,jy=iyyy,jm; if (jy == 0) nrerror("julday: there is no year zero."); if (jy < 0) ++jy; if (mm > 2) { Here is an example o fa block IF-structure. jm=mm+1; } else { http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin --jy; jm=mm+13; } jul = (long) (floor(365.25*jy)+floor(30.6001*jm)+id+1720995); if (id+31L*(mm+12L*iyyy) >= IGREG) { Test whether to change to Gregorian Cal- endar. ja=(int)(0.01*jy); jul += 2-ja+(int) (0.25*ja); } return jul; } (Astronomers number each 24-hour period, starting and ending at noon , with [7] a unique integer, the Julian Day Number . Julian Day Zero was a very long time ago; a convenient reference point is that Julian Day 2440000 began at noon of May 23, 1968. If you know the Julian Day Number that begins at noon of a given calendar date, then the day of the week of that date is obtained by adding 1 and taking the result modulo base 7; a zero answer corresponds to Sunday, 1 to Monday, ... , 6 to Saturday.) Most languages (though not While iteration. FORTRAN , incidentally) provide example: C for structures like the following while (n < 1000) { n*=2; j+=1; } It is the particular feature of this structure that the control-clause (in this case n < 1000 ) is evaluated before each iteration. If the clause is not true, the enclosed statements will not be executed. In particular, if this code is encountered at a time when n is greater than or equal to 1000, the statements will not even be executed once. while iteration is a related control- Do-While iteration. Companion to the structure that tests its control-clause at the end of each iteration. In C , it looks like this: g of machine- isit website do { ica). n*=2; j+=1; } while (n < 1000); In this case, the enclosed statements will be executed at least once, independent of the initial value of n . Break. In this case, you have a loop that is repeated inde fi nitely until some condition tested somewhere in the middle of the loop (and possibly tested in more

37 1.1 Program Organization and Control Structures 13 than one place) becomes true. At that point you wish to exit the loop and proceed the structure is implemented with the simple C with what comes after it. In break , do ,or switch statement, which terminates execution of the innermost for , while construction and proceeds to the next sequential instruction. (In Pascal and standard FORTRAN , this structure requires the use of statement labels, to the detriment of clear programming.) A typical usage of the break statement is: Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v for(;;) { [statements before the test] if (...) break; [statements after the test] } [next sequential instruction] Here is a program that uses several different iteration structures. One of us was once asked, for a scavenger hunt, to fi nd the date of a Friday the 13th on which the moon was full. This is a program which accomplishes that task, giving incidentally all other Fridays the 13th as a by-product. #include #include 5 is Eastern Standard Time. #define ZON -5.0 Time zone − The range o fdates to be searched. #define IYBEG 1900 #define IYEND 2000 int main(void) /* Program badluk */ { void flmoon(int n, int nph, long *jd, float *frac); long julday(int mm, int id, int iyyy); int ic,icon,idwk,im,iyyy,n; float timzon = ZON/24.0,frac; long jd,jday; printf("\nFull moons on Friday the 13th from %5d to %5d\n",IYBEG,IYEND); for (iyyy=IYBEG;iyyy<=IYEND;iyyy++) { Loop over each year, for (im=1;im<=12;im++) { and each month. jday=julday(im,13,iyyy); Is the 13th a Friday? idwk=(int) ((jday+1) % 7); if (idwk == 5) { n=(int)(12.37*(iyyy-1900+(im-0.5)/12.0)); This value n is a first approximation to how many full moons have occurred since 1900. We will feed it into the phase routine and adjust it up or down until we determine that our desired 13th was or was not a full moon. The signals the direction o fadjustment. icon variable icon=0; for (;;) { n flmoon(n,2,&jd,&frac); Get date o f full moon . g of machine- frac=24.0*(frac+timzon); Convert to hours in correct time zone. isit website if (frac < 0.0) { Convert from Julian Days beginning at ica). noon to civil days beginning at mid- --jd; frac += 24.0; night. } if (frac > 12.0) { ++jd; frac -= 12.0; } else frac += 12.0; if (jd == jday) { Did we hit our target day? printf("\n%2d/13/%4d\n",im,iyyy); printf("%s %5.1f %s\n","Full moon",frac,

38 14 Chapter 1. Preliminaries " hrs after midnight (EST)"); break; Part o fthe break-structure, a match. Didn’t hit it. } else { ic=(jday >= jd ? 1 : -1); if (ic == (-icon)) break; Another break, case o fno match. icon=ic; n += ic; } readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin } } } } return 0; } If you are merely curious, there were (or will be) occurrences of a full moon − 5 on Friday the 13th (time zone GMT ) on: 3/13/1903, 10/13/1905, 6/13/1919, 1/13/1922, 11/13/1970, 2/13/1987, 10/13/2000, 9/13/2019, and 8/13/2049. Our advice is to avoid them. Every pro- Other “standard” structures. “ goodies ” that the designer just couldn ’ t gramming language has some number of resist throwing in. They seemed like a good idea at the time. Unfortunately they ’ don test of time! Your program becomes dif fi cult to translate into other t stand the languages, and dif fi cult to read (because rarely used structures are unfamiliar to the reader). You can almost always accomplish the supposed conveniences of these structures in other ways. default case ... In C , the most problematic control structure is the switch ... construction (see Figure 1.1.1), which has historically been burdened by uncertainty, from compiler to compiler, about what data types are allowed in its control expression. char and int Data types float are universally supported. For other data types, e.g., or , the structure should be replaced by a more recognizable and translatable double if ... else construction. ANSI C allows the control expression to be of type long , but many older compilers do not. The continue; construction, while benign, can generally be replaced by an if construction with no loss of clarity. About “Advanced Topics” either one outside of ” Material set in smaller type, like this, signals an “ advanced topic, the main argument of the chapter, or else one requiring of you more than the usual assumed mathematical background, or else (in a few cases) a discussion that is more speculative or an algorithm that is less well-tested. Nothing important will be lost if you skip the advanced topics on a fi rst reading of the book. You may have noticed that, by its looping over the months and years, the program badluk g of machine- isit website avoids using any algorithm for converting a Julian Day Number back into a calendar date. A ica). routine for doing just this is not very interesting structurally, but it is occasionally useful: #include #define IGREG 2299161 void caldat(long julian, int *mm, int *id, int *iyyy) Inverse o fthe function given above. Here julian is input as a Julian Day Number, julday mm , id ,and iyyy as the month, day, and year on which the specified and the routine outputs Julian Day started at noon. { long ja,jalpha,jb,jc,jd,je;

39 1.2 Some C Conventions for Scientific Computing 15 Cross-over to Gregorian Calendar produces this correc- if (julian >= IGREG) { jalpha=(long)(((double) (julian-1867216)-0.25)/36524.25); tion. ja=julian+1+jalpha-(long) (0.25*jalpha); } else if (julian < 0) { Make day number positive by adding integer number of Julian centuries, then subtract them off ja=julian+36525*(1-julian/36525); at the end. } else ja=julian; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) jb=ja+1524; jc=(long)(6680.0+((double) (jb-2439870)-122.1)/365.25); jd=(long)(365*jc+(0.25*jc)); je=(long)((jb-jd)/30.6001); *id=jb-jd-(long) (30.6001*je); *mm=je-1; if (*mm > 12) *mm -= 12; *iyyy=jc-4715; if (*mm > 2) --(*iyyy); if (*iyyy <= 0) --(*iyyy); if (julian < 0) *iyyy -= 100*(1-julian/36525); } [8] (For additional calendrical algorithms, applicable to various historical calendars, see .) CITED REFERENCES AND FURTHER READING: Harbison, S.P., and Steele, G.L., Jr. 1991, C: A Reference Manual , 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall). Kernighan, B.W. 1978, The Elements of Programming Style (New York: McGraw-Hill). [1] Yourdon, E. 1975, Techniques of Program Structure and Design (Englewood Cliffs, NJ: Prentice- Hall). [2] The Art of C Programming (New York: Springer-Verlag). [3] Jones, R., and Stewart, I. 1987, , vol. 24, pp. 75–83. Hoare, C.A.R. 1981, Communications of the ACM Wirth, N. 1983, Programming in Modula-2 , 3rd ed. (New York: Springer-Verlag). [4] Stroustrup, B. 1986, The C++ Programming Language (Reading, MA: Addison-Wesley). [5] (Scotts Borland International, Inc. 1989, Turbo Pascal 5.5 Object-Oriented Programming Guide Valley, CA: Borland International). [6] Astronomical Formulae for Calculators , 2nd ed., revised and enlarged (Rich- Meeus, J. 1982, mond, VA: Willmann-Bell). [7] Hatcher, D.A. 1984, Quarterly Journal of the Royal Astronomical Society , vol. 25, pp. 53–55; see op. cit. also 1985, vol. 26, pp. 151–155, and 1986, vol. 27, pp. 506–507. [8] 1.2 Some C Conventions for Scientific Computing g of machine- isit website The C language was devised originally for systems programming work, not for ica). scientific computing. Relative to other high-level programming languages, C puts the programmer “very close to the machine” in several respects. It is operator-rich, giving direct access to most capabilities of a machine-language instruction set. It has a large variety of intrinsic data types (short and long, signed and unsigned integers; floating and double-precision reals; pointer types; etc.), and a concise syntax for effecting conversions and indirections. It defines an arithmetic on pointers (addresses) that relates gracefully to array addressing and is highly compatible with the index register structure of many computers.

40 16 Preliminaries Chapter 1. is the language. C C Portability has always been another strong point of the underlying language of the UNIX operating system; both the language and the operating system have by now been implemented on literally hundreds of different computers. The language’s universality, portability, and flexibility have attracted increasing numbers of scientists and engineers to it. It is commonly used for the real-time control of experimental hardware, often in spite of the fact that the standard Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin UNIX kernel is less than ideal as an operating system for this purpose. The use of C for higher level scientific calculations such as data analysis, modeling, and floating-point numerical work has generally been slower in developing. FORTRAN as the mother-tongue of In part this is due to the entrenched position of virtually all scientists and engineers born before 1960, and most born after. In part, also, the slowness of C ’s penetration into scientific computing has been due to deficiencies in the language that computer scientists have been (we think, stubbornly) slow to recognize. Examples are the lack of a good way to raise numbers to small integer powers, and the “implicit conversion of float to double ” issue, discussed below. Many, though not all, of these deficiencies are overcome in the ANSI C Standard. Some remaining deficiencies will undoubtedly disappear over time. C Yet another inhibition to the mass conversion of scientists to the cult has been, up to the time of writing, the decided lack of high-quality scientific or numerical Numerical Recipes . libraries. That is the lacuna into which we thrust this edition of We certainly do not claim to be a complete solution to the problem. We do hope to inspire further efforts, and to lay out by example a set of sensible, practical conventions for scientific C programming. The need for programming conventions in C is very great. Far from the problem of overcoming constraints imposed by the language (our repeated experience with is to choose the best and most natural techniques from Pascal ), the problem in C multiple opportunities — and then to use those techniques completely consistently from program to program. In the rest of this section, we set out some of the issues, and describe the adopted conventions that are used in all of the routines in this book. Function Prototypes and Header Files ANSI C allows functions to be defined with function prototypes , which specify the type of each function parameter. If a function declaration or definition with a prototype is visible, the compiler can check that a given function call invokes the function with the correct argument types. All the routines printed in this book are in ANSI C prototype form. For the benefit of readers with older “traditional includes two complete sets of K&R” C compilers, the Numerical Recipes C Diskette g of machine- programs, one in ANSI, the other in K&R. isit website ica). The easiest way to understand prototypes is by example. A function definition that would be written in traditional C as int g(x,y,z) int x,y; float z; becomes in ANSI C

41 1.2 Some C Conventions for Scientific Computing 17 int g(int x, int y, float z) void A function that has no parameters has the parameter type list . A function declaration (as contrasted to a function definition) is used to “introduce” a function to a routine that is going to call it. The calling routine needs Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. to know the number and type of arguments and the type of the returned value. In a function declaration, you are allowed to omit the parameter names. Thus the declaration for the above function is allowed to be written int g(int, int, float); program consists of multiple source files, the compiler cannot check the C If a consistency of each function call without some additional assistance. The safest way to proceed is as follows: • Every external function should have a single prototype declaration in a header ( .h ) file. • The source file with the definition (body) of the function should also include the header file so that the compiler can check that the prototypes in the declaration and the definition match. • Every source file that calls the function should include the appropriate header ( .h ) file. • Optionally, a routine that calls a function can also include that function’s prototype declaration internally. This is often useful when you are developing a program, since it gives you a visible reminder (checked by .h file) of a function’s argument types. the compiler through the common Later, after your program is debugged, you can go back and delete the supernumary internal declarations. nr.h , For the routines in this book, the header file containing all the prototypes is at the top of listed in Appendix A. You should put the statement #include nr.h every source file that contains Numerical Recipes routines. Since, more frequently than not, you will want to include more than one Numerical Recipes routine in a single source file, we have not printed this #include statement in front of this book’s individual program listings, but you should make sure that it is present in your programs. As backup, and in accordance with the last item on the indented list above, we declare the function prototype of all Numerical Recipes routines that are called Numerical Recipes to the calling routine. (That also by other internally routines g of machine- makes our routines much more readable.) The only exception to this rule is that isit website the small number of utility routines that we use repeatedly (described below) are ica). declared in the additional header file , and the line #include nrutil.h nrutil.h is explicitly printed whenever it is needed. A final important point about the header file nr.h is that, as furnished on the diskette, it contains both ANSI C and traditional K&R-style declarations. The ANSI forms are invoked if any of the following macros are defined: __STDC__ , ANSI ,or NRANSI . (The purpose of the last name is to give you an invocation that does not conflict with other possible uses of the first two names.) If you have an ANSI compiler, it is essential that you invoke it with one or more of these macros

42 18 Preliminaries Chapter 1. defined. The typical means for doing so is to include a switch like “ -DANSI ”on the compiler command line. nr.h Some further details about the file are given in Appendix A. Vectors and One-Dimensional Arrays readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. C between pointers and arrays. There is a close, and elegant, correspondence in a[j] is defined The value referenced by an expression like *((a)+(j)) , to be that is, “the contents of the address obtained by incrementing the pointer by a .” A consequence of this definition is that if points to a legal data location, j a a[0] is always defined. Arrays in the array element are natively “zero-origin” C or “zero-offset.” An array declared by the statement float b[4]; has the valid references b[4] . b[0] , b[1] , b[2] , and b[3] ,but not Right away we need a to indicate what is the valid range of an array notation index. (The issue comes up about a thousand times in this book!) For the above example, the index range of b will be henceforth denoted b[0..3] , a notation borrowed from Pascal . In general, the range of an array declared by float a[ M ]; is a[0.. M − 1 ] , and the same if float is replaced by any other data type. One problem is that many algorithms naturally like to go from 1 to M , not . Sure, you can always convert them, but they then often acquire 1 from 0 to M − a baggage of additional arithmetic in array indices that is, at best, distracting. It is better to use the power of the C language, in a consistent way, to make the problem disappear. Consider float b[4],*bb; bb=b-1; The pointer now points one location before b . An immediate consequence is that bb bb[1] , bb[2] , bb[3] , and bb[4] all exist. In other words the the array elements range of bb is bb[1..4] . We will refer to bb as a unit-offset vector. (See Appendix B for some additional discussion of technical details.) It is sometimes convenient to use zero-offset vectors, and sometimes convenient to use unit-offset vectors in algorithms. The choice should be whichever is most natural to the problem at hand. For example, the coefficients of a polynomial n 2 a[0..n] + a clearly cry out for the zero-offset x + a x x , while + ... + a a n 1 0 2 x a vector of N data points . When a ] ,i =1 ...N calls for a unit-offset x[1.. N i routine in this book has an array as an argument, its header comment always gives the expected index range. For example, g of machine- isit website ica). void someroutine(float bb[], int nn) bb[1..nn] . This routine does something with the vector ... Now, suppose you want someroutine() to do its thing on your own vector, of length 7, say. If your vector, call it aa , is already unit-offset (has the valid range aa[1..7] ), then you can invoke someroutine(aa,7); in the obvious way. That is the recommended procedure, since someroutine() presumably has some logical, or at least aesthetic, reason for wanting a unit-offset vector.

43 1.2 Some C Conventions for Scientific Computing 19 , C But suppose that your vector of length 7, now call it a , is perversely a native zero-offset array (has range a[0..6] ). Perhaps this is the case because you disagree with our aesthetic prejudices, Heaven help you! To use our recipe, do you have to a ’s contents element by element into another, unit-offset vector? No! Do you copy ? No! You simply invoke aaa have to declare a new pointer a-1 and set it equal to a[1] someroutine(a-1,7); . Then , as seen from within our recipe, is actually http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v a[0] as seen from your program. In other words, you can change conventions “on the fly” with just a couple of keystrokes. Forgive us for belaboring these points. We want to free you from the zero-offset thinking that C encourages but (as we see) does not require. A final liberating point is that the utility file nrutil.c , listed in full in Appendix B, includes functions for allocating (using malloc() ) arbitrary-offset vectors of arbitrary lengths. The synopses of these functions are as follows: float *vector(long nl, long nh) . [nl..nh] Allocates a float vector with range int *ivector(long nl, long nh) vector with range [nl..nh] . int Allocates an unsigned char *cvector(long nl, long nh) [nl..nh] unsigned char vector with range . Allocates an unsigned long *lvector(long nl, long nh) Allocates an unsigned long vector with range [nl..nh] . double *dvector(long nl, long nh) [nl..nh] Allocates a double vector with range . followed by float *b; A typical use of the above utilities is the declaration b[1..7] come into existence and allows , which makes the range b=vector(1,7); b to be passed to any function calling for a unit-offset vector. nrutil.c also contains the corresponding deallocation routines, The file void free_vector(float *v, long nl, long nh) void free_ivector(int *v, long nl, long nh) void free_cvector(unsigned char *v, long nl, long nh) void free_lvector(unsigned long *v, long nl, long nh) void free_dvector(double *v, long nl, long nh) g of machine- with the typical use being free_vector(b,1,7); . isit website Our recipes use the above utilities extensively for the allocation and deallocation ica). of vector workspace. We also commend them to you for use in your main programs or other procedures. Note that if you want to allocate vectors of length longer than 64k on an IBM PC-compatible computer, you should replace all occurrences of malloc in nrutil.c by your compiler’s special-purpose memory allocation function. This applies also to matrix allocation, to be discussed next.

44 20 Preliminaries Chapter 1. Matrices and Two-Dimensional Arrays The zero- versus unit-offset issue arises here, too. Let us, however, defer it for a moment in favor of an even more fundamental matter, that of variable dimension terminology) or terminology). These Pascal ( arrays ( FORTRAN conformant arrays Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin are arrays that need to be passed to a function along with real-time information about their two-dimensional size. The systems programmer rarely deals with two- dimensional arrays, and almost never deals with two-dimensional arrays whose size is variable and known only at run time. Such arrays are, however, the bread and butter of scientific computing. Imagine trying to live with a matrix inversion routine that could work with only one size of matrix! There is no technical reason that a compiler could not allow a syntax like C void someroutine(a,m,n) float a[m][n]; /* ILLEGAL DECLARATION */ and emit code to evaluate the variable dimensions m and n (or any variable-dimension expression) each time someroutine() is entered. Alas! the above fragment is forbidden by the language definition. The implementation of variable dimensions C in C instead requires some additional finesse; however, we will see that one is rewarded for the effort. There is a subtle near-ambiguity in the C syntax for two-dimensional array references. Let us elucidate it, and then turn it to our advantage. Consider the j are expressions and array reference to a (say) float value a[i][j] , where i that evaluate to type C compiler will emit quite different machine code for .A int a has been declared. If a has been this reference, depending on how the identifier float a[5][9]; , then the machine code is: “to declared as a fixed-size array, e.g., the address a add 9 times i , then add j , return the value thus addressed.” Notice that the constant 9 needs to be known in order to effect the calculation, and an integer multiplication is required (see Figure 1.2.1). Suppose, on the other hand, that a has been declared by float **a; . Then a[i][j] , take the value thus the machine code for i is: “to the address of a add j to it, return the value addressed by this new addressed as a new address, add does not enter this calculation address.” Notice that the underlying size of a[][] at all, and that there is no multiplication; an additional indirection replaces it. We thus have, in general, a faster and more versatile scheme than the previous one. The price that we pay is the storage requirement for one array of pointers (to the rows ), and the slight inconvenience of remembering to initialize those pointers of a[][] g of machine- isit website when we declare an array. ica). Here is our bottom line: We avoid the fixed-size two-dimensional arrays of as C being unsuitable data structures for representing matrices in scientific computing. We adopt instead the convention “pointer to array of pointers,” with the array elements pointing to the first element in the rows of each matrix. Figure 1.2.1 contrasts the rejected and adopted schemes. The following fragment shows how a fixed-size array a of size 13 by 9 is converted to a “pointer to array of pointers” reference aa :

45 1.2 Some C Conventions for Scientific Computing 21 **m [0][0] [0][1] [0][2] [0][3] [0][4] [1][3] [1][4] [1][0] [1][1] [1][2] Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. [2][0] [2][1] [2][2] [2][3] [2][4] (a) **m [0][2] [0][3] [0][4] [0][1] [0][0] *m[0] [1][1] [1][2] *m[1] [1][3] [1][4] [1][0] *m[2] [2][0] [2][1] [2][2] [2][3] [2][4] (b) Figure 1.2.1. m . Dotted lines denote address reference, while solid Two storage schemes for a matrix fi xed size two-dimensional array. (b) Pointer lines connect sequential memory locations. (a) Pointer to a to an array of pointers to rows; this is the scheme adopted in this book. float a[13][9],**aa; int i; aa=(float **) malloc((unsigned) 13*sizeof(float*)); for(i=0;i<=12;i++) aa[i]=a[i]; a[i] is a pointer to a[i][0] The identi aa[0..12][0..8] . You can use is now a matrix with index range fi er aa , and more importantly you can pass it as an argument ad lib or modify its elements . That function, which declares the corresponding to any function by its name aa dummy argument as float **aa; , can address its elements as aa[i][j] without knowing its physical size . You may rightly not wish to clutter your programs with code like the above fragment. Also, there is still the outstanding problem of how to treat unit-offset indices, so that (for example) the above matrix aa could be addressed with the range a[1..13][1..9] . Both of these problems are solved by additional utility routines nrutil.c (Appendix B) which allocate and deallocate matrices of arbitrary in range. The synopses are g of machine- isit website ica). float **matrix(long nrl, long nrh, long ncl, long nch) float matrix with range [nrl..nrh][ncl..nch] . Allocates a double **dmatrix(long nrl, long nrh, long ncl, long nch) Allocates a double matrix with range [nrl..nrh][ncl..nch] . int **imatrix(long nrl, long nrh, long ncl, long nch) Allocates an int matrix with range [nrl..nrh][ncl..nch] . void free_matrix(float **m, long nrl, long nrh, long ncl, long nch) Frees a matrix allocated with matrix .

46 22 Preliminaries Chapter 1. void free_dmatrix(double **m, long nrl, long nrh, long ncl, long nch) Frees a matrix allocated with dmatrix . void free_imatrix(int **m, long nrl, long nrh, long ncl, long nch) . imatrix Frees a matrix allocated with Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v A typical use is float **a; a=matrix(1,13,1,9); ... a[3][5]=... ...+a[2][9]/3.0... someroutine(a,...); ... free_matrix(a,1,13,1,9); All matrices in Numerical Recipes are handled with the above paradigm, and we commend it to you. Some further utilities for handling matrices are also included in nrutil.c . The fi rst is a function submatrix() that sets up a new pointer reference to an already-existing matrix (or sub-block thereof), along with new offsets if desired. Its synopsis is float **submatrix(float **a, long oldrl, long oldrh, long oldcl, long oldch, long newrl, long newcl) Point a submatrix [newrl..newrl+(oldrh-oldrl)][newcl..newcl+(oldch-oldcl)] to a[oldrl..oldrh][oldcl..oldch] . the existing matrix range and oldrh are respectively the lower and upper row indices of the oldrl Here and oldcl oldch original matrix that are to be represented by the new matrix, are the corresponding column indices, and newrl and newcl are the lower row and ’ column indices for the new matrix. (We don t need upper row and column indices, since they are implied by the quantities already given.) fi b[1..2] Two sample uses might be, submatrix rst, to select as a 2 × 2 [1..2] some interior range of an existing matrix, say a[4..5][2..3] , float **a,**b; a=matrix(1,13,1,9); ... b=submatrix(a,4,5,2,3,1,1); g of machine- isit website ica). and second, to map an existing matrix a[1..13][1..9] into a new matrix b[0..12][0..8] , float **a,**b; a=matrix(1,13,1,9); ... b=submatrix(a,1,13,1,9,0,0);

47 1.2 Some C Conventions for Scientific Computing 23 for matrices of any type whose submatrix() sizeof() Incidentally, you can use (often true for is the same as sizeof(float) int , e.g.); just cast the fi rst argument . float ** int ** and cast the result to the desired type, e.g., to type The function http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin void free_submatrix(float **b, long nrl, long nrh, long ncl, long nch) frees the array of row-pointers allocated by submatrix() not free . Note that it does the memory allocated to the data in the submatrix, since that space still lies within the memory allocation of some original matrix. Finally, if you have a standard matrix declared as a[nrow][ncol] , and you C want to convert it into a matrix declared in our pointer-to-row-of-pointers manner, the following function does the trick: float **convert_matrix(float *a, long nrl, long nrh, long ncl, long nch) Allocate a float matrix m[nrl..nrh][ncl..nch] that points to the matrix declared in the ncol=nch-ncl+1 and .The standard C manner as a[nrow][ncol] ,where nrow=nrh-nrl+1 as the first argument. &a[0][0] routine should be called with the address (You can use this function when you want to make use of C ’ s initializer syntax to set values for a matrix, but then be able to pass the matrix to programs in this book.) The function void free_convert_matrix(float **b, long nrl, long nrh, long ncl, long nch) Free a matrix allocated by convert_matrix() . frees the allocation, without affecting the original matrix a . The only examples of allocating a three- dimensional array as a pointer-to- pointer-to-pointer structure in this book are found in the routines rlft3 in § 12.5 and in 17.4. The necessary allocation and deallocation functions are sfroid § float ***f3tensor(long nrl, long nrh, long ncl, long nch, long ndl, long ndh) . Allocate a float 3-dimensional array with subscript range [nrl..nrh][ncl..nch][ndl..ndh] void free_f3tensor(float ***t, long nrl, long nrh, long ncl, long nch, long ndl, long ndh) . f3tensor() Free a float 3-dimensional array allocated by g of machine- isit website ica). Complex Arithmetic does not have complex data types, or prede fi ned arithmetic operations on C complex numbers. That omission is easily remedied with the set of functions in the fi le complex.c which is printed in full in Appendix C at the back of the book. A synopsis is as follows:

48 24 Preliminaries Chapter 1. typedef struct FCOMPLEX {float r,i;} fcomplex; fcomplex Cadd(fcomplex a, fcomplex b) Returns the complex sum of two complex numbers. fcomplex Csub(fcomplex a, fcomplex b) Returns the complex difference of two complex numbers. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v fcomplex Cmul(fcomplex a, fcomplex b) Returns the complex product of two complex numbers. fcomplex Cdiv(fcomplex a, fcomplex b) Returns the complex quotient of two complex numbers. fcomplex Csqrt(fcomplex z) Returns the complex square root of a complex number. fcomplex Conjg(fcomplex z) Returns the complex conjugate of a complex number. float Cabs(fcomplex z) Returns the absolute value (modulus) of a complex number. fcomplex Complex(float re, float im) Returns a complex number with specified real and imaginary parts. fcomplex RCmul(float x, fcomplex a) Returns the complex product of a real number and a complex number. The implementation of several of these complex operations in fl oating-point arithmetic is not entirely trivial; see § 5.4. Only about half a dozen routines in this book make explicit use of these complex arithmetic functions. The resulting code is not as readable as one would like, because the familiar operations +-*/ are replaced by function calls. The C++ extension to the C language allows operators to be rede fi ned. That would allow more readable code. However, in this book we are committed to standard C . We should mention that the above functions assume the ability to pass, return, that are de fi ned and assign structures like FCOMPLEX (or types such as fcomplex . All recent C compilers have this ability, but it is not in by value to be structures) fi the original K&R C de nition. If you are missing it, you will have to rewrite the functions in complex.c , making them pass and return pointers to variables of type fcomplex instead of the variables themselves. Likewise, you will need to modify the recipes that use the functions. Several other routines (e.g., the Fourier transforms four1 and fourn )do that is, they carry around real and imaginary parts as by hand, complex arithmetic “ ” g of machine- cient code than would be obtained by using float variables. This results in more ef fi isit website the functions in complex.c . But the code is even less readable. There is simply no ica). ideal solution to the complex arithmetic problem in . C Implicit Conversion of Float to Double In traditional (K&R) C , float variables are automatically converted to double before any operation is attempted, including both arithmetic operations and passing as arguments to functions. All arithmetic is then done in double precision. If a float variable receives the result of such an arithmetic operation, the high precision

49 1.2 Some C Conventions for Scientific Computing 25 is immediately thrown away. A corollary of these rules is that all the real-number standard C library functions are of type double and compute to double precision. The justi fi cation for these conversion rules is, “ well, there ’ s nothing wrong with “ and a little extra precision, this way the libraries need only one version of each ” function. ” One does not need much experience in scienti fi c computing to recognize that the implicit conversion rules are, in fact, sheer madness! In effect, they make it Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) impossible to write ef fi cient numerical programs. One of the cultural barriers that separates computer scientists from “ regular ” scientists and engineers is a differing point of view on whether a 30% or 50% loss of speed is worth worrying about. In many real-time or state-of-the-art scienti fi c applications, such a loss is catastrophic. ’ ’ s The practical scientist is trying to solve tomorrow s problem with yesterday computer; the computer scientist, we think, often has it the other way around. not allow implicit conversion for arithmetic The ANSI C standard happily does does require it for function arguments, unless the function is fully operations, but it prototyped by an ANSI declaration as described earlier in this section. That is another reason for our being rigorous about using the ANSI prototype mechanism, and a good reason for you to use an ANSI-compatible compiler. Some older compilers do provide an optional compilation mode in which C the implicit conversion of float to double is suppressed. Use this if you can. In this book, when we write float , we mean float ; when we write double , we mean double , i.e., there is a good algorithmic reason for having higher precision. Our routines all can tolerate the traditional implicit conversion rules, fi cient without them. Of course, if your application actually but they are more ef float requires double precision, you can change our declarations from double to fi without dif culty. (The brute force approach is to add a preprocessor statement #define float double !) A Few Wrinkles We like to keep code compact, avoiding unnecessary spaces unless they add immediate clarity. We usually don t put space around the assignment operator “ = ” . ’ Through a quirk of history, however, some C compilers recognize the (nonexistent) , and operator “ =- ” as being equivalent to the subtractive assignment operator “ -= ” “ =* ” as being the same as the multiplicative assignment operator “ *= ” . That is why you will see us write . y=(*a); y= -10.0; or y=(-10.0); , and y= *a; or ’ t write We have the same viewpoint regarding unnecessary parentheses. You can effectively unless you memorize its operator precedence and associativity (or read) C rules. Please study the accompanying table while you brush your teeth every night. g of machine- We never use the register storage class speci fi er. Good optimizing compilers isit website are quite sophisticated in making their own decisions about what to keep in registers, ica). and the best choices are sometimes rather counter-intuitive. defining Different compilers use different methods of distinguishing between and referencing declarations of the same external name in several fi les. We follow the most common scheme, which is also the ANSI standard. The storage class extern is explicitly included on all referencing top-level declarations. The storage class is omitted from the single de fi ning declaration for each external variable. We have commented these declarations, so that if your compiler uses a different scheme [1] . you can change the code. The various schemes are discussed in § 4.8 of

50 26 Preliminaries Chapter 1. Operator Precedence and Associativity Rules in C () function call left-to-right array element [] structure or union member . -> pointer reference to structure Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer logical not ! right-to-left ~ bitwise complement - unary minus ++ increment -- decrement & address of contents of * (type) cast to type size in bytes sizeof * multiply left-to-right / divide % remainder + add left-to-right - subtract << bitwise left shift left-to-right >> bitwise right shift arithmetic less than left-to-right < > arithmetic greater than arithmetic less than or equal to <= >= arithmetic greater than or equal to arithmetic equal left-to-right == != arithmetic not equal & bitwise and left-to-right left-to-right ^ bitwise exclusive or | bitwise or left-to-right && logical and left-to-right || logical or left-to-right right-to-left ?: conditional expression = assignment operator right-to-left also += -= *= /= %= g of machine- <<= >>= &= ^= |= isit website ica). , left-to-right sequential expression We have already alluded to the problem of computing small integer powers of numbers, most notably the square and cube. The omission of this operation from C is perhaps the language ’ s most galling insult to the scienti fi c programmer. All good FORTRAN compilers recognize expressions like (A+B)**4 and produce in-line code, in this case with only one add and two multiplies. It is typical for constant integer powers up to 12 to be thus recognized.

51 1.2 Some C Conventions for Scientific Computing 27 In C , the mere problem of squaring is hard enough! Some people “ macro-ize ” the operation as #define SQR(a) ((a)*(a)) calls to two results in However, this is likely to produce code where SQR(sin(x)) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer the sine routine! You might be tempted to avoid this by storing the argument of the squaring function in a temporary variable: static float sqrarg; #define SQR(a) (sqrarg=(a),sqrarg*sqrarg) The global variable sqrarg now has (and needs to keep) scope over the whole module, which is a little dangerous. Also, one needs a completely different macro to square expressions of type int . More seriously, this macro can fail if there are two SQR operations in a single expression. Since in the order of evaluation of pieces of C ’ s discretion, the value of the expression is at the compiler in one evaluation sqrarg of SQR can be that from the other evaluation in the same expression, producing nonsensical results. When we need a guaranteed-correct SQR macro, we use the following, which exploits the guaranteed complete evaluation of subexpressions in a conditional expression: static float sqrarg; #define SQR(a) ((sqrarg=(a)) == 0.0 ? 0.0 : sqrarg*sqrarg) A collection of macros for other simple operations is included in the fi le nrutil.h (see Appendix B) and used by many of our programs. Here are the synopses: SQR(a) Square a float value. Square a double value. DSQR(a) FMAX(a,b) values. float Maximum of two Minimum of two float values. FMIN(a,b) Maximum of two DMAX(a,b) values. double DMIN(a,b) Minimum of two double values. IMAX(a,b) Maximum of two int values. Minimum of two IMIN(a,b) int values. LMAX(a,b) Maximum of two long values. Minimum of two values. LMIN(a,b) long b . Magnitude of a times sign of SIGN(a,b) may someday become a bed of roses; for now, Scienti fi c programming in C watch out for the thorns! g of machine- isit website ica). CITED REFERENCES AND FURTHER READING: C: A Reference Manual , 3rd ed. (Englewood Cliffs, Harbison, S.P., and Steele, G.L., Jr. 1991, NJ: Prentice-Hall). [1] AT&T Bell Laboratories 1985, The C Programmer’s Handbook (Englewood Cliffs, NJ: Prentice- Hall). Kernighan, B., and Ritchie, D. 1978, The C Programming Language (Englewood Cliffs, NJ: Prentice-Hall). [Reference for K&R “traditional” C. Later editions of this book conform to the ANSI C standard.] Hogan, T. 1984, The C Programmer’s Handbook (Bowie, MD: Brady Communications).

52 28 Preliminaries Chapter 1. 1.3 Error, Accuracy, and Stability Although we assume no prior training of the reader in formal numerical analysis, we will need to presume a common understanding of a few key concepts. We will define these briefly in this section. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Computers store numbers not with infinite precision but rather in some approxi- bits (binary digits) or mation that can be packed into a fixed number of (groups bytes of 8 bits). Almost all computers allow the programmer a choice among several different such representations or data types . Data types can differ in the number of bits utilized (the wordlength ), but also in the more fundamental respect of whether the stored number is represented in fixed-point ( int or long )or floating-point ( or double ) format. float A number in integer representation is exact. Arithmetic between numbers in integer representation is also exact, with the provisos that (i) the answer is not outside the range of (usually, signed) integers that can be represented, and (ii) that division is interpreted as producing an integer result, throwing away any integer remainder. In floating-point representation, a number is represented internally by a sign bit s (interpreted as plus or minus), an exact integer exponent e , and an exact positive integer mantissa M . Taken together these represent the number E − e ( 1.3.1 ) × B × s M B B =2 , but sometimes B =16 ), where is the base of the representation (usually E is the bias of the exponent, a fixed integer constant for any given machine and and representation. An example is shown in Figure 1.3.1. Several floating-point bit patterns can represent the same number. If =2 , B for example, a mantissa with leading (high-order) zero bits can be left-shifted, i.e., multiplied by a power of 2, if the exponent is decreased by a compensating amount. normalized . Most Bit patterns that are “as left-shifted as they can be” are termed computers always produce normalized results, since these don’t waste any bits of the mantissa and thus allow a greater accuracy of the representation. Since the )is =2 one, some high-order bit of a properly normalized mantissa (when B always computers don’t store this bit at all, giving one extra bit of significance. Arithmetic among numbers in floating-point representation is not exact, even if the operands happen to be exactly represented (i.e., have exact values in the form of equation 1.3.1). For example, two floating numbers are added by first right-shifting g of machine- (dividing by two) the mantissa of the smaller (in magnitude) one, simultaneously isit website increasing its exponent, until the two operands have the same exponent. Low-order ica). (least significant) bits of the smaller operand are lost by this shifting. If the two operands differ too greatly in magnitude, then the smaller operand is effectively replaced by zero, since it is right-shifted to oblivion. The smallest (in magnitude) floating-point number which, when added to the floating-point number 1.0, produces a floating-point result different from 1.0 is . A typical computer with B =2 and a 32-bit termed the machine accuracy  m − 8 wordlength has  3 × 10 . (A more detailed discussion of machine around m characteristics, and a program to determine them, is given in § 20.1.) Roughly

53 1.3 Error, Accuracy, and Stability 29 ” phantom “ this bit could be 8-bit exponent sign bit 23-bit mantissa Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. 1 (a) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 = 2 ⁄ (b) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 = 3 1 (c) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 0 = 4 ⁄ 7 − (d) 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 = 10 ... (e) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 = 7 − 0 (f ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 = 10 + 3 Figure 1.3.1. Floating point representations of numbers in a typical 32-bit (4-byte) format. (a) The 1 / 2 (note the bias in the exponent); (b) the number number ; (c) the number 1 / 4 ; (d) the number 3 7 − 7 − 10 , represented to machine accuracy; (e) the same number , but shifted so as to have the same 10 7 − 3 ; with this shifting, all signi fi cance is lost and 10 exponent as the number becomes zero; shifting to 7 − a common exponent must occur before two numbers can be added; (f) sum of the numbers 3+10 , − 7 can be represented accurately by itself, it cannot which equals 3 to machine accuracy. Even though 10 accurately be added to a much larger number. oating-point speaking, the machine accuracy  fl is the fractional accuracy to which m fi cant numbers are represented, corresponding to a change of one in the least signi bit of the mantissa. Pretty much any arithmetic operation among fl oating numbers . This  should be thought of as introducing an additional fractional error of at least m type of error is called roundoff error . is not the smallest oating-point number fl It is important to understand that  m number depends on how many bits there That that can be represented on a machine. depends on how many bits there are in the mantissa. are in the exponent, while  m Roundoff errors accumulate with increasing amounts of calculation. If, in the N such arithmetic operations, course of obtaining a calculated value, you perform √ N ,if you might be so lucky as to have a total roundoff error on the order of m the roundoff errors come in randomly up or down. (The square root comes from a random-walk.) However, this estimate can be very badly off the mark for two reasons: (i) It very frequently happens that the regularities of your calculation, or the peculiarities of your computer, cause the roundoff errors to accumulate preferentially . N in one direction. In this case the total will be of order m (ii) Some especially unfavorable occurrences can vastly increase the roundoff error of single operations. Generally these can be traced to the subtraction of two cant bits are those very nearly equal numbers, giving a result whose only signi fi g of machine- (few) low-order ones in which the operands differed. You might think that such a isit website ica). coincidental ” subtraction is unlikely to occur. Not always so. Some mathematical “ expressions magnify its probability of occurrence tremendously. For example, in the familiar formula for the solution of a quadratic equation, √ 2 b − 4 ac − + b ( 1.3.2 ) x = a 2 2 ac  b the addition becomes delicate and roundoff-prone whenever . (In § 5.6 we will learn how to avoid the problem in this particular case.)

54 30 Preliminaries Chapter 1. Roundoff error is a characteristic of computer hardware. There is another, different, kind of error that is a characteristic of the program or algorithm used, independent of the hardware on which the program is executed. Many numerical discrete algorithms compute “ quan- ” approximations to some desired “ continuous ” tity. For example, an integral is evaluated numerically by computing a function “ point. Or, a function may be ” every at a discrete set of points, rather than at readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) fi fi nite series, rather evaluated by summing a nite number of leading terms in its in fi than all in nity terms. In cases like this, there is an adjustable parameter, e.g., the number of points or of terms, such that the “ true ” answer is obtained only when that parameter goes to in fi nity. Any practical calculation is done with a fi nite, but suf fi ciently large, choice of that parameter. The discrepancy between the true answer and the answer obtained in a practical calculation is called the . Truncation error would persist even on a truncation error “ perfect hypothetical, computer that had an in fi nitely accurate representation and no ” roundoff error. As a general rule there is not much that a programmer can do about roundoff error, other than to choose algorithms that do not magnify it unnecessarily (see discussion of “ stability ” below). Truncation error, on the other hand, is entirely under the programmer s control. In fact, it is only a slight exaggeration to say ’ that clever minimization of truncation error is practically the entire content of the fi eld of numerical analysis! Most of the time, truncation error and roundoff error do not strongly interact with one another. A calculation can be imagined as having, fi rst, the truncation error that it would have if run on an in fi nite-precision computer, “ plus ” the roundoff error associated with the number of operations performed. unstable . This Sometimes, however, an otherwise attractive method can be means that any roundoff error that becomes ” the calculation at an early mixed into “ stage is successively magni fi ed until it comes to swamp the true answer. An unstable method would be useful on a hypothetical, perfect computer; but in this imperfect — or if unstable world it is necessary for us to require that algorithms be stable that we use them with great caution. Here is a simple, if somewhat arti fi cial, example of an unstable algorithm: “ Golden Suppose that it is desired to calculate all integer powers of the so-called Mean, ” the number given by √ 5 − 1 1.3.3 ) ≈ 0 . 61803398 ( φ ≡ 2 n It turns out (you can easily verify) that the powers φ satisfy a simple recursion relation, g of machine- isit website n n +1 1 − n ica). φ φ − φ ( 1.3.4 ) = 0 1 . =0 , we can successively =1 and φ 61803398 φ fi Thus, knowing the rst two values apply (1.3.4) performing only a single subtraction, rather than a slower multiplication by φ , at each stage. Unfortunately, the recurrence (1.3.4) also has another solution, namely the value √ 1 ( 5+1) . Since the recurrence is linear, and since this undesired solution has − 2 magnitude greater than unity, any small admixture of it introduced by roundoff errors will grow exponentially. On a typical machine with 32-bit wordlength, (1.3.4) starts

55 1.3 Error, Accuracy, and Stability 31 n is down to only to give completely wrong answers by about n =16 , at which point φ − 4 10 unstable . The recurrence (1.3.4) is , and cannot be used for the purpose stated. We will encounter the question of stability in many more sophisticated guises, later in this book. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v CITED REFERENCES AND FURTHER READING: (New York: Springer-Verlag), Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis Chapter 1. Kahaner, D., Moler, C., and Nash, S. 1989, Numerical Methods and Software (Englewood Cliffs, NJ: Prentice Hall), Chapter 2. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis , 2nd ed. (Reading, MA: Addison- § 1.3. Wesley), Rounding Errors in Algebraic Processes (Englewood Cliffs, NJ: Prentice- Wilkinson, J.H. 1964, Hall). g of machine- isit website ica).

56 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Chapter 2. Solution of Linear Algebraic Equations 2.0 Introduction A set of linear algebraic equations looks like this: a = x x + a a x + + b ··· x + a N 2 12 1 13 1 N 11 3 1 = a x + a x + a b x + ··· + a x 23 2 21 2 22 2 N 1 N 3 x b + a x + a a x + ··· + a x = 33 3 32 3 N 1 N 2 31 3 ) ( 2.0.1 ······ a b = x x + a a + x ··· + a + x 3 3 2 2 M MN 1 N 1 M M M N x unknowns Here the , j =1 , 2 ,...,N are related by M equations. The j coefficients a with i =1 , 2 ,...,M and j =1 , 2 ,...,N are known numbers, as ij right-hand side are the b quantities , i =1 , 2 ,...,M . i Nonsingular versus Singular Sets of Equations N If = M then there are as many equations as unknowns, and there is a good chance of solving for a unique solution set of x ’s. Analytically, there can fail to j equations is a linear combination of M be a unique solution if one or more of the row degeneracy , or if all equations contain certain the others, a condition called g of machine- variables only in exactly the same linear combination, called column degeneracy . isit website (For square matrices, a row degeneracy implies a column degeneracy, and vice ica). versa.) A set of equations that is degenerate is called singular . We will consider singular matrices in some detail in § 2.6. Numerically, at least two additional things can go wrong: • While not exact linear combinations of each other, some of the equations may be so close to linearly dependent that roundoff errors in the machine render them linearly dependent at some stage in the solution process. In this case your numerical procedure will fail, and it can tell you that it has failed. 32

57 2.0 Introduction 33 Accumulated roundoff errors in the solution process can swamp the true • solution. This problem particularly emerges if N is too large. The numerical procedure does not fail algorithmically. However, it returns a set of ’s that are wrong, as can be discovered by direct substitution back x into the original equations. The closer a set of equations is to being singular, the more likely this is to happen, since increasingly close cancellations http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v will occur during the solution. In fact, the preceding item can be viewed as the special case where the loss of significance is unfortunately total. Much of the sophistication of complicated “linear equation-solving packages” is devoted to the detection and/or correction of these two pathologies. As you work with large linear sets of equations, you will develop a feeling for when such sophistication is needed. It is difficult to give any firm guidelines, since there is no such thing as a “typical” linear problem. But here is a rough idea: Linear sets with as large as 20 or 50 can be routinely solved in single precision (32 bit floating N representations) without resorting to sophisticated methods, if the equations are not close to singular. With double precision (60 or 64 bits), this number can readily be extended to N as large as several hundred, after which point the limiting factor is generally machine time, not accuracy. Even larger linear sets, in the thousands or greater, can be solved when the N coefficients are sparse (that is, mostly zero), by methods that take advantage of the sparseness. We discuss this further in 2.7. § At the other end of the spectrum, one seems just as often to encounter linear problems which, by their underlying nature, are close to singular. In this case, you need to resort to sophisticated methods even for the case of N =10 (though might rarely for N =5 ). Singular value decomposition ( § 2.6) is a technique that can sometimes turn singular problems into nonsingular ones, in which case additional sophistication becomes unnecessary. Matrices Equation (2.0.1) can be written in matrix form as A · ) = b ( 2.0.2 x is the matrix of coefficients, and A Here the raised dot denotes matrix multiplication, is the right-hand side written as a column vector, b     a ... a a b 11 12 1 1 N g of machine- isit website b a ... a a     21 2 N 22 2 A = 2.0.3 ( b ) =     ica). ··· ··· a ... a a b M M 2 1 M MN denotes its row, the second a By convention, the first index on an element ij index its column. For most purposes you don’t need to know how a matrix is stored in a computer’s physical memory; you simply reference matrix elements by their 1.2, § . We have already seen, in = a[3][4] a two-dimensional addresses, e.g., 34 that this C notation can in fact hide a rather subtle and versatile physical storage scheme, “pointer to array of pointers to rows.” You might wish to review that section

58 34 Chapter 2. Solution of Linear Algebraic Equations at this point. Occasionally it is useful to be able to peer through the veil, for example ,...,N to pass a whole row a[i][j], j= 1 by the reference a[i] . Tasks of Computational Linear Algebra We will consider the following tasks as falling in the general purview of this Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v chapter: Solution of the matrix equation · x = b for an unknown vector x • A , where A is a square matrix of coefficients, raised dot denotes matrix multiplication, b § and § 2.10). 2.1– is a known right-hand side vector ( b = , for a set of vectors • A Solution of more than one matrix equation x · j j x , , each corresponding to a different, known right-hand side =1 , 2 ,... j j is held . In this task the key simplification is that the matrix A b vector j § 2.10). § ’s, are changed ( constant, while the right-hand sides, the b 2.1– − 1 Calculation of the matrix • A which is the matrix inverse of a square matrix − − 1 1 is the identity matrix (all zeros = A A = 1 , where 1 · A · A , i.e., A except for ones on the diagonal). This task is equivalent, for an N × N ,...,N ’s ) 2 , =1 j ( , different , to the previous task with A matrix b N j b namely the unit vectors ( = all zero elements except for 1 in the j th j component). The corresponding ’s are then the columns of the matrix x A inverse of § 2.1 and § 2.3). ( • Calculation of the determinant of a square matrix A ( § 2.3). If ,orif M = N but the equations are degenerate, then there MN In the opposite case there are more equations than unknowns, . When x this occurs there is, in general, no solution vector to equation (2.0.1), and the set of equations is said to be overdetermined . It happens frequently, however, that the best “compromise” solution is sought, the one that comes closest to satisfying all equations simultaneously. If closeness is defined in the least-squares sense, i.e., that the sum of the squares of the differences between the left- and right-hand sides of g of machine- isit website equation (2.0.1) be minimized, then the overdetermined linear problem reduces to ica). a (usually) solvable linear problem, called the • Linear least-squares problem. The reduced set of equations to be solved can be written as the N × N set of equations T T =( A ) · x b A · · )( 2.0.4 ) A ( T where A denotes the transpose of the matrix A . Equations (2.0.4) are called the normal equations of the linear least-squares problem. There is a close connection

59 2.0 Introduction 35 between singular value decomposition and the linear least-squares problem, and the latter is also discussed in § 2.6. You should be warned that direct solution of the normal equations (2.0.4) is not generally the best way to find least-squares solutions. Some other topics in this chapter include Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer • Iterative improvement of a solution ( § 2.5) 2.9), tridiagonal § • Various special forms: symmetric positive-definite ( § 2.4), band diagonal ( § 2.4), Toeplitz ( § 2.8), Vandermonde ( § 2.8), sparse ( ( § 2.7) • Strassen’s “fast matrix inversion” ( § 2.11). Standard Subroutine Packages We cannot hope, in this chapter or in this book, to tell you everything there is to know about the tasks that have been defined above. In many cases you will have no alternative but to use sophisticated black-box program packages. Several good ones are available, though not always in . LINPACK was developed at Argonne National C Laboratories and deserves particular mention because it is published, documented, and available for free use. A successor to LINPACK, LAPACK, is now becoming ) include C available. Packages available commercially (though not necessarily in those in the IMSL and NAG libraries. You should keep in mind that the sophisticated packages are designed with very large linear systems in mind. They therefore go to great effort to minimize not only the number of operations, but also the required storage. Routines for the various tasks are usually provided in several versions, corresponding to several possible simplifications in the form of the input coefficient matrix: symmetric, triangular, banded, positive definite, etc. If you have a large matrix in one of these forms, you should certainly take advantage of the increased efficiency provided by these different routines, and not just use the form provided for general matrices. There is also a great watershed dividing routines that are direct (i.e., execute in a predictable number of operations) from routines that are iterative (i.e., attempt to converge to the desired answer in however many steps are necessary). Iterative methods become preferable when the battle against loss of significance is in danger or because the problem is close to singular. We N of being lost, either due to large will treat iterative methods only incompletely in this book, in § 2.7 and in Chapters 18 and 19. These methods are important, but mostly beyond our scope. We will, however, discuss in detail a technique which is on the borderline between direct and iterative methods, namely the iterative improvement of a solution that has been g of machine- obtained by direct methods ( § 2.5). isit website ica). CITED REFERENCES AND FURTHER READING: Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins Golub, G.H., and Van Loan, C.F. 1989, University Press). Gill, P.E., Murray, W., and Wright, M.H. 1991, Numerical Linear Algebra and Optimization , vol. 1 (Redwood City, CA: Addison-Wesley). Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), Chapter 4. Dongarra, J.J., et al. 1979, LINPACK User’s Guide (Philadelphia: S.I.A.M.).

60 36 Chapter 2. Solution of Linear Algebraic Equations Handbook for Matrix Computations Coleman, T.F., and Van Loan, C. 1988, (Philadelphia: S.I.A.M.). Forsythe, G.E., and Moler, C.B. 1967, Computer Solution of Linear Algebraic Systems (Engle- wood Cliffs, NJ: Prentice-Hall). , vol. II of Linear Algebra Wilkinson, J.H., and Reinsch, C. 1971, Handbook for Automatic Com- putation (New York: Springer-Verlag). Westlake, J.R. 1968, A Handbook of Numerical Matrix Inversion and Solution of Linear Equations Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer (New York: Wiley). Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis , 2nd ed. (Reading, MA: Addison- Wesley), Chapter 2. , 2nd ed. (New York: A First Course in Numerical Analysis Ralston, A., and Rabinowitz, P. 1978, McGraw-Hill), Chapter 9. 2.1 Gauss-Jordan Elimination is about as efficient as any For inverting a matrix, Gauss-Jordan elimination other method. For solving sets of linear equations, Gauss-Jordan elimination produces both the solution of the equations for one or more right-hand side vectors 1 − . However, its principal weaknesses are (i) that it b , and also the matrix inverse A requires all the right-hand sides to be stored and manipulated at the same time, and not desired, Gauss-Jordan is three times slower (ii) that when the inverse matrix is than the best alternative technique for solving a single linear set ( § 2.3). The method’s principal strength is that it is as stable as any other direct method, perhaps even a bit more stable when full pivoting is used (see below). If you come along later with an additional right-hand side vector, you can multiply it by the inverse matrix, of course. This does give an answer, but one that is quite susceptible to roundoff error, not nearly as good as if the new vector had been included with the set of right-hand side vectors in the first instance. For these reasons, Gauss-Jordan elimination should usually not be your method of first choice, either for solving linear equations or for matrix inversion. The decomposition methods in § 2.3 are better. Why do we give you Gauss-Jordan at all? Because it is straightforward, understandable, solid as a rock, and an exceptionally good “psychological” backup for those times that something is going wrong and you think it might be your linear-equation solver. Some people believe that the backup is more than psychological, that Gauss- Jordan elimination is an “independent” numerical method. This turns out to be mostly myth. Except for the relatively minor differences in pivoting, described below, the actual sequence of operations performed in Gauss-Jordan elimination is g of machine- very closely related to that performed by the routines in the next two sections. isit website For clarity, and to avoid writing endless ellipses ( ··· ) we will write out equations ica). only for the case of four equations and four unknowns, and with three different right- hand side vectors that are known in advance. You can write bigger matrices and extend the equations to the case of N × N matrices, with M sets of right-hand side vectors, in completely analogous fashion. The routine implemented below is, of course, general.

61 2.1 Gauss-Jordan Elimination 37 Elimination on Column-Augmented Matrices Consider the linear matrix equation             x a y y x x a y a a y 12 11 13 13 14 11 12 11 14 13 12 y y y x a y a x x a a 24 23 23 22 21 22 21 23 22 21 24               ·  a y x a y a y a y x x 33 33 31 34 31 32 32 34 31 33 32 readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. a a a y y a y x x x y 41 43 42 42 43 42 44 41 44 41 43           b b 1000 b 12 11 13 b b b 0100 22 21 23           =   2.1.1  ( ) b b b 0010 33 32 31 0001 b b b 41 43 42 · ) signifies matrix multiplication, while the operator  Here the raised dot ( just signifies column augmentation, that is, removing the abutting parentheses and making a wider matrix out of the operands of the  operator. It should not take you long to write out equation (2.1.1) and to see that it simply th j ) of the vector solution of the is the i th component ( i =1 , 2 , 3 , 4 states that x ij ; and 4 , 3 , 2 , =1 ,i right-hand side ( ), the one whose coefficients are 3 , 2 , =1 j b ij that the matrix of unknown coefficients y is the inverse matrix of a . In other ij ij words, the matrix solution of ) 2.1.2 ]( 1  b  b  b  x ]=[  x Y  x [ · ] A [ 3 1 3 2 2 1 ’s are column vectors, and ’s and x 1 is b where are square matrices, the Y A and i i the identity matrix, simultaneously solves the linear sets = b ) A · x 2.1.3 = b ( A · x b = · x A 3 3 2 2 1 1 and · ) 2.1.4 A ( Y = 1 Now it is also elementary to verify the following facts about (2.1.1): b Interchanging any two rows of A and the corresponding rows of the • ’s and of 1 , does not change (or scramble in any way) the solution x ’s and Y . Rather, it just corresponds to writing the same set of linear equations in a different order. Likewise, the solution set is unchanged and in no way scrambled if we • by a linear combination of itself and any other row, A replace any row in b ’s and 1 as long as we do the same linear combination of the rows of the (which then is no longer the identity matrix, of course). of • Interchanging any two columns gives the same solution set only A g of machine- if we simultaneously interchange corresponding rows of the x ’s and of isit website Y . In other words, this interchange scrambles the order of the rows in ica). the solution. If we do this, we will need to unscramble the solution by restoring the rows to their original order. Gauss-Jordan elimination uses one or more of the above operations to reduce the matrix A to the identity matrix. When this is accomplished, the right-hand side becomes the solution set, as one sees instantly from (2.1.2).

62 38 Solution of Linear Algebraic Equations Chapter 2. Pivoting In “Gauss-Jordan elimination with no pivoting,” only the second operation in (this being a a the above list is used. The first row is divided by the element 11 trivial linear combination of the first row with any other row — zero coefficient for Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin the other row). Then the right amount of the first row is subtracted from each other now agrees with A ’s zero. The first column of a row to make all the remaining 1 i the identity matrix. We move to the second column and divide the second row by , then subtract the right amount of the second row from rows 1, 3, and 4, so as to a 22 make their entries in the second column zero. The second column is now reduced to the identity form. And so on for the third and fourth columns. As we do these operations to A , we of course also do the corresponding operations to the b ’s and to 1 (which by now no longer resembles the identity matrix in any way!). Obviously we will run into trouble if we ever encounter a zero element on the (then current) diagonal when we are going to divide by the diagonal element. (The element that we divide by, incidentally, is called the pivot element or pivot .) Not so obvious, but true, is the fact that Gauss-Jordan elimination with no pivoting (no use of the first or third procedures in the above list) is numerically unstable in the presence never of any roundoff error, even when a zero pivot is not encountered. You must do Gauss-Jordan elimination (or Gaussian elimination, see below) without pivoting! is this magic pivoting? Nothing more than interchanging rows ( partial So what full pivoting pivoting ) or rows and columns ( ), so as to put a particularly desirable element in the diagonal position from which the pivot is about to be selected. Since we don’t want to mess up the part of the identity matrix that we have already built up, we can choose among elements that are both (i) on rows below (or on) the one that is about to be normalized, and also (ii) on columns to the right (or on) the column we are about to eliminate. Partial pivoting is easier than full pivoting, because we don’t have to keep track of the permutation of the solution vector. Partial pivoting makes available as pivots only the elements already in the correct column. It turns out that partial pivoting is “almost” as good as full pivoting, in a sense that can be made mathematically precise, but which need not concern us here (for discussion [1] ). To show you both variants, we do full pivoting in the routine and references, see § 2.3. in this section, partial pivoting in We have to state how to recognize a particularly desirable pivot when we see one. The answer to this is not completely known theoretically. It is known, both theoretically and in practice, that simply picking the largest (in magnitude) available element as the pivot is a very good choice. A curiosity of this procedure, however, is that the choice of pivot will depend on the original scaling of the equations. If we take g of machine- the third linear equation in our original set and multiply it by a factor of a million, it isit website is almost guaranteed that it will contribute the first pivot; yet the underlying solution ica). of the equations is not changed by this multiplication! One therefore sometimes sees would have been largest if the routines which choose as pivot that element which original equations had all been scaled to have their largest coefficient normalized to unity. This is called implicit pivoting . There is some extra bookkeeping to keep track of the scale factors by which the rows would have been multiplied. (The routines in § 2.3 include implicit pivoting, but the routine in this section does not.) Finally, let us consider the storage requirements of the method. With a little reflection you will see that at every stage of the algorithm, either an element of A is

63 2.1 Gauss-Jordan Elimination 39 predictably a one or zero (if it is already in a part of the matrix that has been reduced or else the exactly corresponding element of the matrix that started to identity form) as 1 is predictably a one or zero (if its mate in A has not been reduced to the identity form). Therefore the matrix 1 does not have to exist as separate storage: The matrix inverse of A is gradually built up in A as the original A is destroyed. Likewise, and share the solution vectors x can gradually replace the right-hand side vectors b http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin A the same storage, since after each column in is reduced, the corresponding row b entry in the ’s is never again used. Here is the routine for Gauss-Jordan elimination with full pivoting: #include #include "nrutil.h" #define SWAP(a,b) {temp=(a);(a)=(b);(b)=temp;} void gaussj(float **a, int n, float **b, int m) Linear equation solution by Gauss-Jordan elimination, equation (2.1.1) above. a[1..n][1..n] b[1..n][1..m] is input containing the is the input matrix. right-hand side vectors. On m is replaced by its matrix inverse, and b is replaced by the corresponding set of solution a output, vectors. { int *indxc,*indxr,*ipiv; int i,icol,irow,j,k,l,ll; float big,dum,pivinv,temp; indxc are ,and indxc=ivector(1,n); The integer arrays ipiv , indxr indxr=ivector(1,n); used for bookkeeping on the pivoting. ipiv=ivector(1,n); for (j=1;j<=n;j++) ipiv[j]=0; for (i=1;i<=n;i++) { This is the main loop over the columns to be reduced. big=0.0; This is the outer loop of the search for a pivot for (j=1;j<=n;j++) element. if (ipiv[j] != 1) for (k=1;k<=n;k++) { if (ipiv[k] == 0) { if (fabs(a[j][k]) >= big) { big=fabs(a[j][k]); irow=j; icol=k; } } } ++(ipiv[icol]); We now have the pivot element, so we interchange rows, if needed, to put the pivot element on the diagonal. The columns are not physically interchanged, only relabeled: th pivot element, is the th column that is reduced, while indxc[i] ,thecolumnofthe i i = is the row in which that pivot element was originally located. If indxr[i]  indxr[i] indxc[i] there is an implied column interchange. With this form of bookkeeping, the g of machine- solution b ’s will end up in the correct order, and the inverse matrix will be scrambled isit website by columns. ica). if (irow != icol) { for (l=1;l<=n;l++) SWAP(a[irow][l],a[icol][l]) for (l=1;l<=m;l++) SWAP(b[irow][l],b[icol][l]) } indxr[i]=irow; We are now ready to divide the pivot row by the pivot element, located at irow and icol . indxc[i]=icol; if (a[icol][icol] == 0.0) nrerror("gaussj: Singular Matrix"); pivinv=1.0/a[icol][icol]; a[icol][icol]=1.0; for (l=1;l<=n;l++) a[icol][l] *= pivinv; for (l=1;l<=m;l++) b[icol][l] *= pivinv;

64 40 Chapter 2. Solution of Linear Algebraic Equations for (ll=1;ll<=n;ll++) Next, we reduce the rows... if (ll != icol) { ...except for the pivot one, of course. dum=a[ll][icol]; a[ll][icol]=0.0; for (l=1;l<=n;l++) a[ll][l] -= a[icol][l]*dum; for (l=1;l<=m;l++) b[ll][l] -= b[icol][l]*dum; } } readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin This is the end of the main loop over columns of the reduction. It only remains to unscram- ble the solution in view of the column interchanges. We do this by interchanging pairs of columns in the reverse order that the permutation was built up. for (l=n;l>=1;l--) { if (indxr[l] != indxc[l]) for (k=1;k<=n;k++) SWAP(a[k][indxr[l]],a[k][indxc[l]]); And we are done. } free_ivector(ipiv,1,n); free_ivector(indxr,1,n); free_ivector(indxc,1,n); } Row versus Column Elimination Strategies The above discussion can be amplified by a modest amount of formalism. Row operations on a matrix A correspond to pre- (that is, left-) multiplication by some simple matrix R . For example, the matrix R with components  , =2  i 4 and j = i 1 if   =2 =4 j , 1 i if 2.1.5 ( ) = R ij 1  i =4 , j =2 if  otherwise 0 effects the interchange of rows 2 and 4 . Gauss-Jordan elimination by row operations alone partial pivoting) consists of a series of such left-multiplications, (including the possibility of yielding successively A · x = b · R b · R · · R · · R R ··· A ) · x = R ··· ( 3 3 2 2 1 1 ( ) 2.1.6 b · R · R · ··· x · ) 1 ( R = 2 1 3 x ··· R = R · R · b · 1 3 2 R ’s build from right to left, the right-hand side is simply The key point is that since the transformed at each stage from one vector to another. Column operations, on the other hand, correspond to post-, or right-, multiplications by simple matrices, call them C . The matrix in equation (2.1.5), if right-multiplied onto a g of machine- A matrix . Elimination by column operations , will interchange A ’s second and fourth columns isit website between the matrix and also its inverse, involves (conceptually) inserting a column operator, ica). A and the unknown vector x : x = b A · 1 − A · C b C · · x = 1 1 1 − 1 − · C b · C = x · · C C · A 1 2 ) 2.1.7 ( 1 2 − − 1 − 1 1 C x ··· ) ··· C = · · · C C b · · C · C ( A 3 2 1 1 3 2 − 1 − 1 − 1 · C b = · C x · ) ··· C 1 ( 3 2 1

65 2.2 Gaussian Elimination with Backsubstitution 41 − 1 which (peeling of the C ’s one at a time) implies a solution x = C ( b ) · 2.1.8 ··· · C C 1 2 3 Notice the essential difference between equation (2.1.8) and equation (2.1.6). In the ’s must be applied to C latter case, the b in the reverse order from that in which they become known. That is, they must all be stored along the way. This requirement greatly reduces the usefulness of column operations, generally restricting them to simple permutations, for Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin example in support of full pivoting. CITED REFERENCES AND FURTHER READING: The Algebraic Eigenvalue Problem (New York: Oxford University Press). [1] Wilkinson, J.H. 1965, Carnahan, B., Luther, H.A., and Wilkes, J.O. 1969, (New York: Applied Numerical Methods Wiley), Example 5.2, p. 282. Bevington, P.R. 1969, Data Reduction and Error Analysis for the Physical Sciences (New York: McGraw-Hill), Program B-2, p. 298. A Handbook of Numerical Matrix Inversion and Solution of Linear Equations Westlake, J.R. 1968, (New York: Wiley). Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: 9.3–1. § McGraw-Hill), 2.2 Gaussian Elimination with Backsubstitution The usefulness of Gaussian elimination with backsubstitution is primarily pedagogical. It stands between full elimination schemes such as Gauss-Jordan, and triangular decomposition schemes such as will be discussed in the next section. Gaussian elimination reduces a matrix not all the way to the identity matrix, but only halfway, to a matrix whose components on the diagonal and above (say) remain nontrivial. Let us now see what advantages accrue. Suppose that in doing Gauss-Jordan elimination, as described in § 2.1, we at each stage subtract away rows only below the then-current pivot element. When a 22 is the pivot element, for example, we divide the second row by its value (as before), (see equation 2.1.1). a , not a and a but now use the pivot row to zero only 32 42 12 Suppose, also, that we do only partial pivoting, never interchanging columns, so that the order of the unknowns never needs to be modified. Then, when we have done this for all the pivots, we will be left with a reduced equation that looks like this (in the case of a single right-hand side vector):       ′ ′ ′ ′ ′ a a a a x b 1 14 12 11 1 13 g of machine- ′ ′ ′ ′ isit website 0 a a x a b       2 2 23 24 22 ) ( · 2.2.1 =       ica). ′ ′ ′ x a a 00 b 3 34 33 3 ′ ′ 000 b a x 4 4 44 Here the primes signify that the a ’s and b ’s do not have their original numerical values, but have been modified by all the row operations in the elimination to this point. The procedure up to this point is termed Gaussian elimination .

66 42 Solution of Linear Algebraic Equations Chapter 2. Backsubstitution ( But how do we solve for the x ’s? The last x x in this example) is already 4 isolated, namely ′ ′ x = ) ( 2.2.2 b /a 4 4 44 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v With the last known we can move to the penultimate x , x 1 ′ ′ x − ]( ) a [ b 2.2.3 x = 4 3 3 34 ′ a 33 and then proceed with the x before that one. The typical step is   N ∑ 1 ′ ′   x = a − x ) 2.2.4 ( b i j i ij ′ a ii +1 j i = The procedure defined by equation (2.2.4) is called backsubstitution . The com- bination of Gaussian elimination and backsubstitution yields a solution to the set of equations. The advantage of Gaussian elimination and backsubstitution over Gauss-Jordan elimination is simply that the former is faster in raw operations count: The innermost loops of Gauss-Jordan elimination, each containing one subtraction and 3 2 N and M times (where there are N equations N one multiplication, are executed and unknowns). The corresponding loops in Gaussian elimination are executed M 1 3 N times (only half the matrix is reduced, and the increasing numbers of only 3 1 2 N M times, respectively. predictable zeros reduce the count to one-third), and 2 1 2 Each backsubstitution of a right-hand side is N executions of a similar loop (one 2 multiplication plus one subtraction). For M  N (only a few right-hand sides) Gaussian elimination thus has about a factor three advantage over Gauss-Jordan. (We could reduce this advantage to a factor 1.5 by not computing the inverse matrix as part of the Gauss-Jordan scheme.) N For computing the inverse matrix (which we can view as the case of M = unit vectors which are the columns of the identity right-hand sides, namely the N 1 3 N (matrix matrix), Gaussian elimination and backsubstitution at first glance require 3 1 1 3 3 reduction) + ( (right-hand side manipulations) + N N N backsubstitutions) 2 2 4 3 3 = loop executions, which is more than the for Gauss-Jordan. However, the N N 3 unit vectors are quite special in containing all zeros except for one element. If this 1 3 loop N is taken into account, the right-side manipulations can be reduced to only 6 executions, and, for matrix inversion, the two methods have identical efficiencies. g of machine- Both Gaussian elimination and Gauss-Jordan elimination share the disadvantage isit website that all right-hand sides must be known in advance. The LU decomposition method ica). in the next section does not share that deficiency, and also has an equally small operations count, both for solution with any number of right-hand sides, and for matrix inversion. For this reason we will not implement the method of Gaussian elimination as a routine. CITED REFERENCES AND FURTHER READING: Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: § 9.3–1. McGraw-Hill),

67 2.3 LU Decomposition and Its Applications 43 Isaacson, E., and Keller, H.B. 1966, Analysis of Numerical Methods (New York: Wiley), § 2.1. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis , 2nd ed. (Reading, MA: Addison- § 2.2.1. Wesley), Westlake, J.R. 1968, A Handbook of Numerical Matrix Inversion and Solution of Linear Equations (New York: Wiley). Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 2.3 LU Decomposition and Its Applications Suppose we are able to write the matrix A as a product of two matrices, U A ( 2.3.1 ) · = L where lower triangular (has elements only on the diagonal and below) and U L is upper triangular (has elements only on the diagonal and above). For the case of is 4 × 4 matrix A , for example, equation (2.3.1) would look like this: a       a a 000 a a β α β β β 13 11 11 14 12 11 13 12 14 α a β β a β a 00 α 0 a 21 24 23 24 22 22 22 21 23       · = α α a a 0 β α 00 β a a 32 33 33 32 34 33 31 34 31 β 000 α α α a a a a α 42 41 44 41 43 42 44 43 44 ( ) 2.3.2 We can use a decomposition such as (2.3.1) to solve the linear set · x =( L · U ) · x = · · ( U A x )= b ( 2.3.3 ) L by first solving for the vector y such that L · y = b ( 2.3.4 ) and then solving ) U · x = y ( 2.3.5 What is the advantage of breaking up one linear set into two successive ones? The advantage is that the solution of a triangular set of equations is quite trivial, as we have already seen in § 2.2 (equation 2.2.4). Thus, equation (2.3.4) can be solved by forward substitution as follows, b 1 y = 1 α 11   ( 2.3.6 ) 1 − i ∑ 1   g of machine- b ,...,N 3 , =2 i − y y α = i i ij j isit website α ii =1 j ica). while (2.3.5) can then be solved by backsubstitution exactly as in equations (2.2.2)– (2.2.4), y N x = N β NN   N 2.3.7 ( ) ∑ 1   − x = x β = N 2 − ,N 1 − i y 1 ,..., i j i ij β ii +1 j = i

68 44 Chapter 2. Solution of Linear Algebraic Equations 2 executions Equations (2.3.6) and (2.3.7) total (for each right-hand side b ) N of an inner loop containing one multiply and one add. If we have N right-hand sides which are the unit column vectors (which is the case when we are inverting a matrix), then taking into account the leading zeros reduces the total execution count 1 1 1 3 3 3 N N N . to , while (2.3.7) is unchanged at of (2.3.6) from 6 2 2 , we can solve with as decomposition of A LU Notice that, once we have the Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer many right-hand sides as we then care to, one at a time. This is a distinct advantage over the methods of 2.1 and § 2.2. § Performing the LU Decomposition How then can we solve for L and U ,given A ? First, we write out the i,j th component of equation (2.3.1) or (2.3.2). That component always is a sum beginning with α β + ··· = a 1 ij i j 1 j is the smaller The number of terms in the sum depends, however, on whether i or number. We have, in fact, the three cases, α ij : α 2.3.10 ( β + α β ) + ··· + α β = a j j i 1 i ij 2 jj 2 1 ij 2 2 N Equations (2.3.8)–(2.3.10) total + N unknown α equations for the N ’s and ’s (the diagonal being represented twice). Since the number of unknowns is greater β than the number of equations, we are invited to specify N of the unknowns arbitrarily and then try to solve for the others. In fact, as we shall see, it is always possible to take ) ≡ 1 i =1 ,...,N ( 2.3.11 α ii A surprising procedure, now, is Crout’s algorithm , which quite trivially solves 2 N the set of + equations (2.3.8)–(2.3.11) for all the α ’s and β ’s by just arranging N the equations in a certain order! That order is as follows: (equation 2.3.11). ,...,N =1 , i =1 α • Set ii = • For each j =1 , i , 3 ,...,N do these two procedures: First, for 2 , namely ,...,j 1 , 2 β , use (2.3.8), (2.3.9), and (2.3.11) to solve for ij − i 1 ∑ g of machine- β β − ( . = a ) 2.3.12 α isit website ik ij kj ij ica). =1 k =1 in 2.3.12 the summation term is taken to mean zero.) Second, i (When , namely j +1 ,j +2 ,...,N use (2.3.10) to solve for α for i = ij ) ( 1 − j ∑ 1 α = − α β 2.3.13 ) . ( a ik kj ij ij β jj =1 k Be sure to do both procedures before going on to the next j .

69 2.3 LU Decomposition and Its Applications 45 a c e g i readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. etc. x b d diagonal elements f h j subdiagonal elements x etc. ’ s algorithm for LU decomposition of a matrix. Elements of the original matrix are Figure 2.3.1. Crout modi fi ed in the order indicated by lower case letters: a, b, c, etc. Shaded boxes show the previously modi ed elements that are used in modifying two typical elements, each indicated by an “ x ” . fi If you work through a few iterations of the above procedure, you will see that the α ’ s and β ’ s that occur on the right-hand side of equations (2.3.12) and (2.3.13) are already determined by the time they are needed. You will also see that every a ij is used only once and never again. This means that the corresponding α or β can ij ij in place. “ ” be stored in the location that the a used to occupy: the decomposition is (equation 2.3.11) are not stored at all.] In brief, α [The diagonal unity elements ii fi Crout ’ s method s, lls in the combined matrix of α ’ s and β ’   g of machine- isit website β β β β 12 11 14 13 ica). α β β β   23 21 24 22 ) ( 2.3.14   β α β α 34 32 33 31 α α α β 44 43 42 41 by columns from left to right, and within each column from top to bottom (see Figure 2.3.1). What about pivoting? Pivoting (i.e., selection of a salubrious pivot element for the division in equation 2.3.13) is absolutely essential for the stability of Crout ’ s

70 46 Solution of Linear Algebraic Equations Chapter 2. method. Only partial pivoting (interchange of rows) can be implemented ef fi ciently. However this is enough to make the method stable. This means, incidentally, that we don LU form, but rather we decompose ’ t actually decompose the matrix A into A . (If we keep track of what that permutation is, this a rowwise permutation of decomposition is just as useful as the original one would have been.) Pivoting is slightly subtle in Crout ’ s algorithm. The key point to notice is that Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) equation (2.3.12) in the case of = j (its fi nal application) is exactly the same as i equation (2.3.13) except for the division in the latter equation; in both cases the upper limit of the sum is k = j − 1(= i − 1) . This means that we don ’ thaveto is the one that happens commit ourselves as to whether the diagonal element β jj to fall on the diagonal in the fi α rst instance, or whether one of the (undivided) ’ s ij below it in the column, i = j +1 ,...,N ,istobe “ promoted ” to become the diagonal β . This can be decided after all the candidates in the column are in hand. As you should be able to guess by now, we will choose the largest one as the diagonal β (pivot element), then do all the divisions by that element en masse . This is Crout’s method with partial pivoting . Our implementation has one additional wrinkle: It initially fi nds the largest element in each row, and subsequently (when it is looking as if we had initially scaled all for the maximal pivot element) scales the comparison the equations to make their maximum coef implicit cient equal to unity; this is the fi pivoting mentioned in § 2.1. #include #include "nrutil.h" A small number. #define TINY 1.0e-20 void ludcmp(float **a, int n, int *indx, float *d) Given a matrix a[1..n][1..n] , this routine replaces it by the LU decomposition of a rowwise permutation of itself. a and n are input. a is output, arranged as in equation (2.3.14) above; is an output vector that records the row permutation effected by the partial indx[1..n] pivoting; is output as ± 1 depending on whether the number of row interchanges was even d or odd, respectively. This routine is used in combination with lubksb to solve linear equations or invert a matrix. { int i,imax,j,k; float big,dum,sum,temp; stores the implicit scaling of each row. float *vv; vv vv=vector(1,n); *d=1.0; No row interchanges yet. for (i=1;i<=n;i++) { Loop over rows to get the implicit scaling informa- tion. big=0.0; for (j=1;j<=n;j++) if ((temp=fabs(a[i][j])) > big) big=temp; if (big == 0.0) nrerror("Singular matrix in routine ludcmp"); g of machine- No nonzero largest element. isit website vv[i]=1.0/big; Save the scaling. ica). } for (j=1;j<=n;j++) { This is the loop over columns of Crout’s method. This is equation (2.3.12) except for i = for (i=1;i

71 2.3 LU Decomposition and Its Applications 47 sum -= a[i][k]*a[k][j]; a[i][j]=sum; if ( (dum=vv[i]*fabs(sum)) >= big) { Is the figure of merit for the pivot better than the best so far? big=dum; imax=i; } } http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) if (j != imax) { Do we need to interchange rows? for (k=1;k<=n;k++) { Yes, do so... dum=a[imax][k]; a[imax][k]=a[j][k]; a[j][k]=dum; } *d = -(*d); ...and change the parity of d . Also interchange the scale factor. vv[imax]=vv[j]; } indx[j]=imax; if (a[j][j] == 0.0) a[j][j]=TINY; If the pivot element is zero the matrix is singular (at least to the precision of the algorithm). For some applications on singular matrices, it is desirable to substitute TINY for zero. if (j != n) { Now, finally, divide by the pivot element. dum=1.0/(a[j][j]); for (i=j+1;i<=n;i++) a[i][j] *= dum; } } Go back for the next column in the reduction. free_vector(vv,1,n); } Here is the routine for forward substitution and backsubstitution, implementing equations (2.3.6) and (2.3.7). void lubksb(float **a, int n, int *indx, float b[]) Solves the set of n linear equations A is input, not as the matrix X = B .Here a[1..n][1..n] · . indx[1..n] is input ludcmp but rather as its LU decomposition, determined by the routine A as the permutation vector returned by . b[1..n] is input as the right-hand side vector ludcmp B , and returns with the solution vector X . a , n ,and indx are not modified by this routine and can be left in place for successive calls with different right-hand sides b . This routine takes b will begin with many zero elements, so it is efficient for use into account the possibility that in matrix inversion. { int i,ii=0,ip,j; float sum; is set to a positive value, it will become the When for (i=1;i<=n;i++) { ii ip=indx[i]; index of the first nonvanishing element of b .Wenow sum=b[ip]; do the forward substitution, equation (2.3.6). The b[ip]=b[i]; only new wrinkle is to unscramble the permutation g of machine- isit website if (ii) as we go. for (j=ii;j<=i-1;j++) sum -= a[i][j]*b[j]; ica). else if (sum) ii=i; A nonzero element was encountered, so from now on we will have to do the sums in the loop above. b[i]=sum; } for (i=n;i>=1;i--) { Now we do the backsubstitution, equation (2.3.7). sum=b[i]; for (j=i+1;j<=n;j++) sum -= a[i][j]*b[j]; b[i]=sum/a[i][i]; Store a component of the solution vector X . } All done! }

72 48 Chapter 2. Solution of Linear Algebraic Equations 1 3 N LU requires about ludcmp The executions of the inner decomposition in 3 loops (each with one multiply and one add). This is thus the operation count for solving one (or a few) right-hand sides, and is a factor of 3 better than the Gauss-Jordan routine gaussj which was given in § 2.1, and a factor of 1.5 better than a Gauss-Jordan routine (not given) that does not compute the inverse matrix. For inverting a matrix, the total count (including the forward and backsubstitution http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 1 1 1 3 3 + ) = + , the same N N ( as discussed following equation 2.3.7 above) is 6 3 2 gaussj as . To summarize, this is the preferred way to solve the linear set of equations : A · x = b float **a,*b,d; int n,*indx; ... ludcmp(a,n,indx,&d); lubksb(a,n,indx,b); will have will be given back in b . Your original matrix A x The answer been destroyed. If you subsequently want to solve a set of equations with the same A but a different right-hand side b , you repeat only lubksb(a,n,indx,b); a not, of course, with the original matrix A , but with and indx as were already set by ludcmp . Inverse of a Matrix decomposition and backsubstitution routines, it is com- Using the above LU fi nd the inverse of a matrix column by column. pletely straightforward to #define N ... float **a,**y,d,*col; int i,j,*indx; ... ludcmp(a,N,indx,&d); Decompose the matrix just once. g of machine- for(j=1;j<=N;j++) { Find inverse by columns. isit website for(i=1;i<=N;i++) col[i]=0.0; ica). col[j]=1.0; lubksb(a,N,indx,col); for(i=1;i<=N;i++) y[i][j]=col[i]; } The matrix y will now contain the inverse of the original matrix a , which will have been destroyed. Alternatively, there is nothing wrong with using a Gauss-Jordan routine like gaussj ( § 2.1) to invert a matrix in place, again destroying the original. Both methods have practically the same operations count.

73 2.3 LU Decomposition and Its Applications 49 − 1 Incidentally, if you ever have the need to compute A · B from matrices A and then backsubstitute with the columns of A decompose LU and B , you should B instead of with the unit vectors that would give A ’ s inverse. This saves a whole matrix multiplication, and is also more accurate. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Determinant of a Matrix The determinant of an LU decomposed matrix is just the product of the diagonal elements, N ∏ det = ( 2.3.15 ) β jj j =1 We don t, recall, compute the decomposition of the original matrix, but rather a ’ decomposition of a rowwise permutation of it. Luckily, we have kept track of whether the number of row interchanges was even or odd, so we just preface the product by the corresponding sign. (You now fi nally know the purpose of setting d in the routine ludcmp .) Calculation of a determinant thus requires one call to , with no subse- ludcmp quent backsubstitutions by lubksb . #define N ... float **a,d; int j,*indx; ... . ± 1 ludcmp(a,N,indx,&d); This returns d as for(j=1;j<=N;j++) d *= a[j][j]; d now contains the determinant of the original matrix a , which will The variable have been destroyed. For a matrix of any substantial size, it is quite likely that the determinant will over fl ow or under fl ow your computer ’ s fl oating-point dynamic range. In this case you can modify the loop of the above fragment and (e.g.) divide by powers of ten, to keep track of the scale separately, or (e.g.) accumulate the sum of logarithms of the absolute values of the factors and the sign separately. Complex Systems of Equations g of machine- isit website , then (i) If your matrix is real, but the right-hand side vector is complex, say b + i d A ica). LU decompose A in the usual way, (ii) backsubstitute b to get the real part of the solution vector, and (iii) backsubstitute d to get the imaginary part of the solution vector. If the matrix itself is complex, so that you want to solve the system d A + i C ) · ( x + i y )=( b + i ( )( 2.3.16 ) then there are two possible ways to proceed. The best way is to rewrite and lubksb ludcmp as complex routines. Complex modulus substitutes for absolute value in the construction of the scaling vector vv and in the search for the largest pivot elements. Everything else goes through in the obvious way, with complex arithmetic used as needed. (See §§ 1.2 and 5.4 for discussion of complex arithmetic in C .)

74 50 Chapter 2. Solution of Linear Algebraic Equations A quick-and-dirty way to solve complex systems is to take the real and imaginary parts of (2.3.16), giving · = y b · x − C A ( 2.3.17 ) = C · x + A · y d equations, which can be written as a 2 N × 2 N set of real Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. ) ( ) ( ) ( b x − C A ) ( 2.3.18 = · d y CA and lubksb in their present forms. This scheme is a factor of and then solved with ludcmp and C are stored twice. It is also a factor of 2 inefficient 2 inefficient in storage, since A in time, since the complex multiplies in a complexified version of the routines would each use 4 real multiplies, while the solution of a N × 2 N problem involves 8 times the work of 2 an N × N one. If you can tolerate these factor-of-two inefficiencies, then equation (2.3.18) is an easy way to proceed. CITED REFERENCES AND FURTHER READING: Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins University Press), Chapter 4. Dongarra, J.J., et al. 1979, (Philadelphia: S.I.A.M.). LINPACK User’s Guide Computer Methods for Mathematical Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, 3.3, and p. 50. § Computations (Englewood Cliffs, NJ: Prentice-Hall), Forsythe, G.E., and Moler, C.B. 1967, Computer Solution of Linear Algebraic Systems (Engle- wood Cliffs, NJ: Prentice-Hall), Chapters 9, 16, and 18. Westlake, J.R. 1968, A Handbook of Numerical Matrix Inversion and Solution of Linear Equations (New York: Wiley). Introduction to Numerical Analysis (New York: Springer-Verlag), Stoer, J., and Bulirsch, R. 1980, § 4.2. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: McGraw-Hill), § 9.11. Horn, R.A., and Johnson, C.R. 1985, (Cambridge: Cambridge University Press). Matrix Analysis 2.4 Tridiagonal and Band Diagonal Systems of Equations The special case of a system of linear equations that is tridiagonal , that is, has nonzero elements only on the diagonal plus or minus one column, is one that occurs band diagonal frequently. Also common are systems that are , with nonzero elements only along a few diagonal lines adjacent to the main diagonal (above and below). g of machine- LU decomposition, forward- and back- For tridiagonal sets, the procedures of isit website O ( N ) operations, and the whole solution can be encoded substitution each take only ica). tridag is one that we will use in later chapters. very concisely. The resulting routine Naturally, one does not reserve storage for the full N × N matrix, but only for the nonzero components, stored as three vectors. The set of equations to be solved is       u r c 0 ··· b 1 1 1 1 a u c r ··· b       2 2 2 2 2       2.4.1 = ( ) · ··· ··· ···             c b a ··· u r N 1 N N 1 1 − 1 − N − 1 N − − ··· b a u r 0 N N N N

75 2.4 Tridiagonal and Band Diagonal Systems of Equations 51 Notice that c are undefined and are not referenced by the routine that follows. and a 1 N #include "nrutil.h" void tridag(float a[], float b[], float c[], float r[], float u[], unsigned long n) Solves for a vector u[1..n] the tridiagonal linear set given by equation (2.4.1). a[1..n] , http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v , c[1..n] ,and r[1..n] are input vectors and are not modified. b[1..n] { unsigned long j; float bet,*gam; gam=vector(1,n); gam is needed. One vector of workspace, if (b[1] == 0.0) nrerror("Error 1 in tridag"); u N − 1 ,with If this happens then you should rewrite your equations as a set of order 2 trivially eliminated. u[1]=r[1]/(bet=b[1]); for (j=2;j<=n;j++) { Decomposition and forward substitution. gam[j]=c[j-1]/bet; bet=b[j]-a[j]*gam[j]; if (bet == 0.0) nrerror("Error 2 in tridag"); Algorithm fails; see be- low. u[j]=(r[j]-a[j]*u[j-1])/bet; } for (j=(n-1);j>=1;j--) u[j] -= gam[j+1]*u[j+1]; Backsubstitution. free_vector(gam,1,n); } can fail even tridag There is no pivoting in tridag . It is for this reason that when the underlying matrix is nonsingular: A zero pivot can be encountered even for a nonsingular matrix. In practice, this is not something to lose sleep about. The kinds of problems that lead to tridiagonal linear sets usually have additional properties will succeed. For example, if tridag which guarantee that the algorithm in | b + > | a | ,...,N | c | | j =1 ( 2.4.2 ) j j j (called diagonal dominance ) then it can be shown that the algorithm cannot encounter a zero pivot. It is possible to construct special examples in which the lack of pivoting in the algorithm causes numerical instability. In practice, however, such instability is almost never encountered — unlike the general matrix problem where pivoting is essential. The tridiagonal algorithm is the rare case of an algorithm that, in practice, is more robust than theory says it should be. Of course, should you ever encounter a tridag fails, you can instead use the more general method for problem for which band diagonal systems, now described (routines bandec and banbks ). g of machine- isit website Some other matrix forms consisting of tridiagonal with a small number of ica). additional elements (e.g., upper right and lower left corners) also allow rapid 2.7. solution; see § Band Diagonal Systems Where tridiagonal systems have nonzero elements only on the diagonal plus or minus one, band diagonal systems are slightly more general and have (say) m ≥ 0 nonzero elements 1 immediately to the left of (below) the diagonal and m ≥ 0 nonzero elements immediately to 2 its right (above it). Of course, this is only a useful classification if m and m are both  N . 1 2

76 52 Solution of Linear Algebraic Equations Chapter 2. In that case, the solution of the linear system by LU decomposition can be accomplished much faster, and in much less storage, than for the general N × N case. The precise definition of a band diagonal matrix with elements a is that ij a j>i ( ) m + 2.4.3 i>j =0 when or + m 2 ij 1 Band diagonal matrices are stored and manipulated in a so-called compact form, which results http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ◦ if the matrix is tilted 45 clockwise, so that its nonzero elements lie in a long, narrow matrix with m rows. This is best illustrated by an example: +1+ m columns and N 1 2 The band diagonal matrix   3100000 4150000     9265000     ( 2.4.4 ) 0358900     0079320   0003846 0000244 N =7 , m which has matrix, =2 , and m 4 =1 , is stored compactly as the 7 × 2 1   31 xx 415 x     9265     ( ) 2.4.5 3589     7932   3846 x 244 x Here denotes elements that are wasted space in the compact format; these will not be referenced by any manipulations and can have arbitrary values. Notice that the diagonal of the original matrix appears in column m +1 , with subdiagonal elements to its left, 1 superdiagonal elements to its right. The simplest manipulation of a band diagonal matrix, stored compactly, is to multiply it by a vector to its right. Although this is algorithmically trivial, you might want to study the following routine carefully, as an example of how to pull nonzero elements a out of the ij compact storage format in an orderly fashion. #include "nrutil.h" void banmul(float **a, unsigned long n, int m1, int m2, float x[], float b[]) m2 rows below the diagonal and m1 Matrix multiply b = A · x ,where A is band diagonal with b[1..n] and output vector b are stored as x[1..n] and x , rows above. The input vector respectively. The array a[1..n][1..m1+m2+1] stores A as follows: The diagonal elements ..n][1..m1] ap- a[1..n][m1+1] . Subdiagonal elements are in a[ j 1 (with j> are in propriate to the number of elements on each subdiagonal). Superdiagonal elements are in n appropriate to the number of elements on each su- a[1.. j ][m1+2..m1+m2+1] with j< g of machine- isit website perdiagonal. { ica). unsigned long i,j,k,tmploop; for (i=1;i<=n;i++) { k=i-m1-1; tmploop=LMIN(m1+m2+1,n-k); b[i]=0.0; for (j=LMAX(1,1-k);j<=tmploop;j++) b[i] += a[i][j]*x[j+k]; } }

77 2.4 Tridiagonal and Band Diagonal Systems of Equations 53 LU A quite It is not possible to store the decomposition of a band diagonal matrix as compactly as the compact form of itself. The decomposition (essentially by Crout’s A method, see § 2.3) produces additional nonzero “fill-ins.” One straightforward storage scheme is to return the upper triangular factor ( U ) in the same space that A previously occupied, and to return the lower triangular factor ( L ) in a separate compact matrix of size N × m . The 1 , gives the determinant) are returned 1 diagonal elements of ± = d (whose product, times U in the first column of A ’s storage space. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. 2.3: The following routine, bandec , is the band-diagonal analog of ludcmp in § #include #define SWAP(a,b) {dum=(a);(a)=(b);(b)=dum;} #define TINY 1.0e-20 void bandec(float **a, unsigned long n, int m1, int m2, float **al, unsigned long indx[], float *d) Given an × n superdiagonal rows, n band diagonal matrix A with m1 subdiagonal rows and m2 a[1..n][1..m1+m2+1] as described in the comment for routine compactly stored in the array decomposition of a rowwise permutation of A . The upper LU , this routine constructs an banmul triangular matrix replaces a . al[1..n][1..m1] , while the lower triangular matrix is returned in is an output vector which records the row permutation effected by the partial indx[1..n] d is output as ± 1 depending on whether the number of row interchanges was even pivoting; or odd, respectively. This routine is used in combination with banbks to solve band-diagonal sets of equations. { unsigned long i,j,k,l; int mm; float dum; mm=m1+m2+1; l=m1; for (i=1;i<=m1;i++) { Rearrange the storage a bit. for (j=m1+2-i;j<=mm;j++) a[i][j-l]=a[i][j]; l--; for (j=mm-l;j<=mm;j++) a[i][j]=0.0; } *d=1.0; l=m1; for (k=1;k<=n;k++) { For each row... dum=a[k][1]; i=k; if (l < n) l++; for (j=k+1;j<=l;j++) { Find the pivot element. if (fabs(a[j][1]) > fabs(dum)) { dum=a[j][1]; i=j; } } indx[k]=i; if (dum == 0.0) a[k][1]=TINY; Matrix is algorithmically singular, but proceed anyway with TINY pivot (desirable in g of machine- some applications). isit website if (i != k) { Interchange rows. ica). *d = -(*d); for (j=1;j<=mm;j++) SWAP(a[k][j],a[i][j]) } for (i=k+1;i<=l;i++) { Do the elimination. dum=a[i][1]/a[k][1]; al[k][i-k]=dum; for (j=2;j<=mm;j++) a[i][j-1]=a[i][j]-dum*a[k][j]; a[i][mm]=0.0; } } }

78 54 Chapter 2. Solution of Linear Algebraic Equations Some pivoting is possible within the storage limitations of bandec , and the above routine does take advantage of the opportunity. In general, when TINY is returned as a , then the original matrix (perhaps as modified by roundoff error) diagonal element of U is in fact singular. In this regard, bandec is somewhat more robust than tridag above, which can fail algorithmically even for nonsingular matrices; bandec is thus also useful (with m = ) for some ill-behaved tridiagonal systems. =1 m 2 1 Once the matrix A has been decomposed, any number of right-hand sides can be solved in http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) turn by repeated calls to banbks , the backsubstitution routine whose analog in § 2.3 is lubksb . #define SWAP(a,b) {dum=(a);(a)=(b);(b)=dum;} void banbks(float **a, unsigned long n, int m1, int m2, float **al, unsigned long indx[], float b[]) Given the arrays a , , and given a right-hand side vector ,and indx as returned from bandec al x . The solution vector overwrites b[1..n] , solves the band diagonal linear equations A · x = b b[1..n] . The other input arrays are not modified, and can be left in place for successive calls with different right-hand sides. { unsigned long i,k,l; int mm; float dum; mm=m1+m2+1; l=m1; for (k=1;k<=n;k++) { Forward substitution, unscrambling the permuted rows as we go. i=indx[k]; if (i != k) SWAP(b[k],b[i]) if (l < n) l++; for (i=k+1;i<=l;i++) b[i] -= al[k][i-k]*b[k]; } l=1; for (i=n;i>=1;i--) { Backsubstitution. dum=b[i]; for (k=2;k<=l;k++) dum -= a[i][k]*b[k+i-1]; b[i]=dum/a[i][1]; if (l < mm) l++; } } are based on the Handbook routines and The routines bandec and banbks bandet1 [1] in bansol1 . CITED REFERENCES AND FURTHER READING: Numerical Methods for Two-Point Boundary-Value Problems Keller, H.B. 1968, (Waltham, MA: Blaisdell), p. 74. Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), g of machine- Example 5.4.3, p. 166. isit website Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: ica). 9.11. § McGraw-Hill), Wilkinson, J.H., and Reinsch, C. 1971, Linear Algebra , vol. II of Handbook for Automatic Com- putation (New York: Springer-Verlag), Chapter I/6. [1] Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins § 4.3. University Press),

79 2.5 Iterative Improvement of a Solution to Linear Equations 55 A Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. x b δ + b δ + x x b δ b x δ − 1 A x is multiplied by Figure 2.5.1. Iterative improvement of the solution to A · x = b . The first guess x + δ . The known vector is subtracted, giving b + δ b to produce b A δ b . The linear set with this right-hand x side is inverted, giving δ x . This is subtracted from the first guess giving an improved solution . 2.5 Iterative Improvement of a Solution to Linear Equations Obviously it is not easy to obtain greater precision for the solution of a linear set than the precision of your computer’s floating-point word. Unfortunately, for large sets of linear equations, it is not always easy to obtain precision equal to, or even comparable to, the computer’s limit. In direct methods of solution, roundoff errors accumulate, and they are magnified to the extent that your matrix is close to singular. You can easily lose two or three significant figures for matrices which (you thought) were far from singular. If this happens to you, there is a neat trick to restore the full machine precision, called iterative improvement of the solution. The theory is very straightforward (see Figure 2.5.1): Suppose that a vector x is the exact solution of the linear set A 2.5.1 · x = b ( ) g of machine- isit website You don’t, however, know x . You only know some slightly wrong solution x + δ x , ica). where δ x is the unknown error. When multiplied by the matrix A , your slightly wrong solution gives a product slightly discrepant from the desired right-hand side , namely b A · ( x + δ x )= b + δ b ( 2.5.2 ) Subtracting (2.5.1) from (2.5.2) gives ) · δ x A δ b ( 2.5.3 =

80 56 Chapter 2. Solution of Linear Algebraic Equations But (2.5.2) can also be solved, trivially, for δ b . Substituting this into (2.5.3) gives ) A · δ x 2.5.4 A · ( x + δ x ) − b ( = x In this equation, the whole right-hand side is known, since x δ is the wrong + solution that you want to improve. It is essential to calculate the right-hand side http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v in double precision, since there will be a lot of cancellation in the subtraction of b . δ x , then subtract this from the wrong Then, we need only solve (2.5.4) for the error solution to get an improved solution. An important extra benefit occurs if we obtained the original solution by LU LU A decomposed form of decomposition. In this case we already have the , and all we need do to solve (2.5.4) is compute the right-hand side and backsubstitute! The code to do all this is concise and straightforward: #include "nrutil.h" void mprove(float **a, float **alud, int n, int indx[], float b[], float x[]) Improves a solution vector x[1..n] .Thematrix A · X = B of the linear set of equations . n a[1..n][1..n] , and the vectors b[1..n] and x[1..n] are input, as is the dimension Also input is alud[1..n][1..n] ,the decomposition of a as returned by ludcmp ,and LU the vector indx[1..n] also returned by that routine. On output, only x[1..n] is modified, to an improved set of values. { void lubksb(float **a, int n, int *indx, float b[]); int j,i; double sdp; float *r; r=vector(1,n); for (i=1;i<=n;i++) { Calculate the right-hand side, accumulating the residual in double precision. sdp = -b[i]; for (j=1;j<=n;j++) sdp += a[i][j]*x[j]; r[i]=sdp; } lubksb(alud,n,indx,r); Solve for the error term, for (i=1;i<=n;i++) x[i] -= r[i]; and subtract it from the old solution. free_vector(r,1,n); } § 2.3 destroys the input matrix as You should note that the routine ludcmp in it both the original matrix decomposes it. Since iterative improvement requires LU LU decomposition, you will need to copy A before calling ludcmp . Likewise and its lubksb destroys b in obtaining x , so make a copy of b also. If you don’t mind this extra storage, iterative improvement is highly recommended: It is a process 2 g of machine- operations (multiply vector by matrix, and backsubstitute — see N of order only isit website discussion following equation 2.3.7); it never hurts; and it can really give you your ica). money’s worth if it saves an otherwise ruined solution on which you have already 3 operations. spent of order N You can call mprove several times in succession if you want. Unless you are starting quite far from the true solution, one call is generally enough; but a second call to verify convergence can be reassuring.

81 2.5 Iterative Improvement of a Solution to Linear Equations 57 More on Iterative Improvement It is illuminating (and will be useful later in the book) to give a somewhat more solid analytical foundation for equation (2.5.4), and also to give some additional results. Implicit in δ the previous discussion was the notion that the solution vector x + x has an error term; but is itself not exact. we neglected the fact that the LU decomposition of A Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v A different analytical approach starts with some matrix B that is assumed to be an 0 , so that inverse of the matrix approximate B A · 1 . A is approximately the identity matrix 0 R of B Define the residual matrix as 0 · A ( 2.5.5 ) R 1 − B ≡ 0 which is supposed to be “small” (we will be more precise below). Note that therefore A = 1 · R ( 2.5.6 ) − B 0 Next consider the following formal manipulation: − 1 − 1 − 1 − 1 − − 1 1 · A B )=( B B ( · B · A · ) · B = =( B ) · A A 0 0 0 0 0 0 2.5.7 ( ) 1 − 2 3 ) + =( 1 + R + R · R R B + ··· ) · B =( − 1 0 0 n th partial sum of the last expression by We can define the n ≡ ( 1 + R + ··· + R ) ) · B 2.5.8 ( B 0 n − 1 → , if the limit exists. A B so that ∞ It now is straightforward to verify that equation (2.5.8) satisfies some interesting b are vectors, define and recurrence relations. As regards solving A · x = b , where x x ≡ ) · b B 2.5.9 ( n n Then it is easy to show that 2.5.10 )( = x ) + B x · ( b − A · x 0 n +1 n n B , and with x − = x δ − This is immediately recognizable as equation (2.5.4), with x n +1 n 0 1 − A . We see, therefore, that equation (2.5.4) does not require that the LU taking the role of decomposition of A be exact, but only that the implied residual R be small. In rough terms, if the residual is smaller than the square root of your computer’s roundoff error, then after one application of equation (2.5.10) (that is, going from x to ) the first neglected term, ≡ B x · b 0 0 1 2 of order R , will be smaller than the roundoff error. Equation (2.5.10), like equation (2.5.4), moreover, can be applied more than once, since it uses only B , and not any of the higher B ’s. 0 A much more surprising recurrence which follows from equation (2.5.8) is one that more doubles at each stage: than n the order B , ) 2.5.11 ( =2 B ,... − B 7 · A · B 3 n =0 , 1 , n n +1 n 2 n , converges B Repeated application of equation (2.5.11), from a suitable starting matrix 0 1 − to the unknown inverse matrix A quadratically (see § 9.4 for the definition of “quadrati- g of machine- cally”). Equation (2.5.11) goes by various names, including Schultz’s Method and Hotelling’s isit website [1] Method ; see Pan and Reif for references. In fact, equation (2.5.11) is simply the iterative ica). Newton-Raphson method of root-finding ( 9.4) applied to matrix inversion. § Before you get too excited about equation (2.5.11), however, you should notice that it involves two full matrix multiplications at each iteration. Each matrix multiplication involves 3 N §§ 2.1–2.3 that direct inversion of A requires adds and multiplies. But we already saw in 3 3 N only N adds and multiplies in toto . Equation (2.5.11) is therefore practical only when special circumstances allow it to be evaluated much more rapidly than is the case for general § 13.10. matrices. We will meet such circumstances later, in In the spirit of delayed gratification, let us nevertheless pursue the two related issues: When does the series in equation (2.5.7) converge; and what is a suitable initial guess B (if, 0 for example, an initial LU decomposition is not feasible)?

82 58 Solution of Linear Algebraic Equations Chapter 2. We can define the norm of a matrix as the largest amplification of length that it is able to induce on a vector, | | v · R ( ) 2.5.12 R ‖ max ‖≡ v  =0 | v | , as one wants a matrix inverse b If we let equation (2.5.7) act on some arbitrary right-hand side Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. to do, it is obvious that a sufficient condition for convergence is ‖ ‖ < 1( 2.5.13 ) R [1] Pan and Reif point out that a suitable initial guess for B is any sufficiently small constant 0 times the matrix transpose of , that is, A T T B ( = A ) or R = 1 − A A · 2.5.14 0 To see why this is so involves concepts from Chapter 11; we give here only the briefest sketch: T · A is a symmetric, positive definite matrix, so it has real, positive eigenvalues. In its A diagonal representation, R takes the form − ) 2.5.15 )( λ − 1 ,..., , 1 λ λ − (1 diag = R 1 N 2 ) will give λ ’s are positive. Evidently any satisfying 0 < < 2 / (max λ where all the i i i , giving the most rapid ‖ R ‖ < 1 . It is not difficult to show that the optimal choice for convergence for equation (2.5.11), is / =2 (max )( 2.5.16 ) λ +min λ i i i i T · in equation (2.5.16). Pan and Reif A A Rarely does one know the eigenvalues of derive several interesting bounds, which are computable directly from A . The following choices guarantee the convergence of B n →∞ , as n ) /( / ∑ ∑ ∑ 2 a ( | or ≤ 1 max a | ) 2.5.17 | a max |× ≤ 1 ij ij jk j i j i j,k The latter expression is truly a remarkable formula, which Pan and Reif derive by noting that the vector norm in equation (2.5.12) need not be the usual L norm, but can instead be either 2 the L (max) norm, or the L (absolute value) norm. See their work for details. ∞ 1 Another approach, with which we have had some success, is to estimate the largest 2 eigenvalue statistically, by calculating s v ≡| · v ’s with randomly | A for several unit vector i i i chosen directions in N -space. The largest eigenvalue λ can then be bounded by the maximum s of 2max ) denote the sample variance and mean, 2 N Va r ( s and /μ ( s μ ) , where Var and i i i respectively. CITED REFERENCES AND FURTHER READING: Numerical Analysis , 2nd ed. (Reading, MA: Addison- Johnson, L.W., and Riess, R.D. 1982, g of machine- § 2.3.4, p. 55. Wesley), isit website Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins ica). University Press), p. 74. Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), § 5.5.6, p. 183. Forsythe, G.E., and Moler, C.B. 1967, (Engle- Computer Solution of Linear Algebraic Systems wood Cliffs, NJ: Prentice-Hall), Chapter 13. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: § 9.5, p. 437. McGraw-Hill), Pan, V., and Reif, J. 1985, in Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (New York: Association for Computing Machinery). [1]

83 2.6 Singular Value Decomposition 59 2.6 Singular Value Decomposition There exists a very powerful set of techniques for dealing with sets of equations or matrices that are either singular or else numerically very close to singular. In many cases where Gaussian elimination and LU decomposition fail to give satisfactory http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin results, this set of techniques, known as singular value decomposition SVD , ,or will diagnose for you precisely what the problem is. In some cases, SVD will not only diagnose the problem, it will also solve it, in the sense of giving you a useful numerical answer, although, as we shall see, not necessarily “the” answer that you thought you should get. SVD is also the method of choice for solving most linear least-squares problems. We will outline the relevant theory in this section, but defer detailed discussion of the use of SVD in this application to Chapter 15, whose subject is the parametric modeling of data. SVD methods are based on the following theorem of linear algebra, whose proof M × N is beyond our scope: Any A whose number of rows M is greater than matrix or equal to its number of columns N , can be written as the product of an M × N column-orthogonal matrix U ,an N × N diagonal matrix W with positive or zero V . elements (the singular values ), and the transpose of an N × N orthogonal matrix The various shapes of these matrices will be made clearer by the following tableau:                     w 1     w   2             T ··· · = · U  A  V       ···         w N         ( 2.6.1 ) are each orthogonal in the sense that their columns are V The matrices U and g of machine- orthonormal, isit website ica). M ∑ N k ≤ 1 ≤ ) 2.6.2 ( δ U U = ik kn in ≤ N 1 ≤ n =1 i N ∑ ≤ k ≤ N 1 ) 2.6.3 ( = V V δ jk jn kn 1 n ≤ N ≤ =1 j

84 60 Chapter 2. Solution of Linear Algebraic Equations or as a tableau,                                           T T http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin = · · U V U V                                   = 1     ( 2.6.4 ) T V is square, it is also row-orthonormal, Since V · V =1 . M

85 2.6 Singular Value Decomposition 61 SVD of a Square Matrix say, then A is square, N × N If the matrix U , V , and W are all square matrices U of the same size. Their inverses are also trivial to compute: and V are orthogonal, so their inverses are equal to their transposes; W is diagonal, so its inverse is the diagonal matrix whose elements are the reciprocals of the elements w . From (2.6.1) j Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin is A it now follows immediately that the inverse of − 1 T V · [ diag (1 /w 2.6.5 )] · U = ( ) A j w The only thing that can go wrong with this construction is for one of the ’s j to be zero, or (numerically) for it to be so small that its value is dominated by ’s have this w roundoff error and therefore unknowable. If more than one of the j problem, then the matrix is even more singular. So, first of all, SVD gives you a clear diagnosis of the situation. of a matrix is defined as the ratio of the largest Formally, the condition number ’s to the smallest of the w ’s. A matrix is singular if its w (in magnitude) of the j j ill-conditioned if its condition number is too condition number is infinite, and it is large, that is, if its reciprocal approaches the machine’s floating-point precision (for − 6 − 12 for single precision or 10 for double). example, less than 10 nullspace range For singular matrices, the concepts of are important. and Consider the familiar set of simultaneous equations A · = b ( 2.6.6 ) x and A as a where A is a square matrix, b are vectors. Equation (2.6.6) defines x linear mapping from the vector space b .If A is singular, then to the vector space x x , called the nullspace, that is mapped to zero, A · x =0 . there is some subspace of The dimension of the nullspace (the number of linearly independent vectors that x can be found in it) is called the nullity of A . Now, there is also some subspace of b that can be “reached” by A , in the sense that there exists some x which is mapped there. This subspace of b is called the range rank of A . The dimension of the range is called the of A .If A is nonsingular, then its range will be all of the vector space b , so its rank is N .If A is singular, then the rank . In fact, the relevant theorem is “rank plus nullity equals .” will be less than N N What has this to do with SVD? SVD explicitly constructs orthonormal bases U whose for the nullspace and range of a matrix. Specifically, the columns of are an orthonormal set of basis vectors that nonzero are same-numbered elements w j zero are are w V span the range; the columns of whose same-numbered elements j g of machine- an orthonormal basis for the nullspace. isit website Now let’s have another look at solving the set of simultaneous linear equations ica). (2.6.6) in the case that A homogeneous equations, where is singular. First, the set of =0 b V whose corresponding w , is solved immediately by SVD: Any column of j is zero yields a solution. When the vector b on the right-hand side is not zero, the important question is whether it lies in the range of or not. If it does, then the singular set of equations A does have a solution x ; in fact it has more than one solution, since any vector in x ) can be added to with a corresponding zero w the nullspace (any column of V j in any linear combination.

86 62 Chapter 2. Solution of Linear Algebraic Equations If we want to single out one particular member of this solution-set of vectors as 2 x | | a representative, we might want to pick the one with the smallest length . Here is replace 1 /w how to find that vector using SVD: Simply by zero if w =0 . (It is not j j very often that one gets to set ∞ =0 !) Then compute (working from right to left) T Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) )] · ( U · b )( 2.6.7 ) V · (1 /w x diag = [ j that are in the This will be the solution vector of smallest length; the columns of V nullspace complete the specification of the solution set. − 1 ′ ′ , where x lies in the nullspace. Then, if W | denotes Proof: Consider + x | x with some elements zeroed, the modified inverse of W ∣ ∣ − 1 T ′ ′ ∣ ∣ V · W x + | x b + x · U · | = ∣ ∣ T − 1 T ′ ∣ ∣ · W ( V = V + x b ) · U · · ) 2.6.8 ( ∣ ∣ T 1 T − ′ ∣ ∣ W b · · + V U · x = Here the first equality follows from (2.6.7), the second and third from the orthonor- mality of V . If you now examine the two terms that make up the sum on the j components only where right-hand side, you will see that the first one has nonzero ′ w =0 , while the second one, since x is in the nullspace, has nonzero j components  j ′ =0 . Therefore the minimum length obtains for x =0 , q.e.d. w only where j b is not in the range of the singular matrix A , then the set of equations (2.6.6) If b A has no solution. But here is some good news: If , then is not in the range of x x equation (2.6.7) can still be used to construct a “solution” vector . This vector A · x will not exactly solve b . But, among all possible vectors x , it will do the = closest possible job in the least squares sense. In other words (2.6.7) finds ( · 2.6.9 x which minimizes r ≡| A ) x − b | The number of the solution. residual r is called the The proof is similar to (2.6.8): Suppose we modify x by adding some arbitrary ′ ′ ′ ′ x − b is modified by adding some b b ≡ A · x . Then . Obviously A · is in x the range of A . We then have ∣ ∣ ∣ ∣ 1 T ′ − T ′ ∣ ∣ ∣ ∣ ( · W · V U − b + b A · x = ) V ( · U · · b W − b + b ) · ∣ ∣ T ′ 1 − g of machine- ∣ ∣ isit website W ( U W · · = − 1) · b + b U · ica). 2.6.10 ( ) ∣ ∣ ] [ T T − ′ 1 ∣ ∣ U b b · + · · 1) − U · = ( W · W U ∣ ∣ − 1 T T ′ ∣ ∣ U · · b + U − 1) b · W · ( = W − 1 ( Now, W · W 1) is a diagonal matrix which has nonzero j components only for − ′ ′ T , since , while =0 b lies in the has nonzero j components only for w b  =0 U w j j ′ A . Therefore the minimum obtains for b range of =0 , q.e.d. Figure 2.6.1 summarizes our discussion of SVD thus far.

87 2.6 Singular Value Decomposition 63 A Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin b x = b A ⋅ x (a) null space of A solutions of solutions of A ⋅ x = c ′ A ⋅ x = d SVD “ solution ” ′ c A x c = ⋅ of range of A d c SVD solution of A x = ⋅ d (b) Figure 2.6.1. (a) A nonsingular matrix A maps a vector space into one of the same dimension. The maps a . (b) A singular matrix vector A x is mapped into b , so that x satis fi es the equation A · x = b vector space into one of lower dimensionality, here a plane into a line, called the range ” of A . The “ “ nullspace ” of A is mapped to zero. The solutions of A · x = d consist of any one particular solution plus any vector in the nullspace, here forming a line parallel to the nullspace. Singular value decomposition (SVD) selects the particular solution closest to zero, as shown. The point c lies outside of the range fi nds the least-squares best compromise solution, namely a of A ,so A · x = c has no solution. SVD ′ x solution of A · = c , as shown. g of machine- isit website In the discussion since equation (2.6.6), we have been pretending that a matrix ica). either is singular or else isn ’ t. That is of course true analytically. Numerically, w however, the far more common situation is that some of the ’ s are very small j but nonzero, so that the matrix is ill-conditioned. In that case, the direct solution methods of LU decomposition or Gaussian elimination may actually give a formal solution to the set of equations (that is, a zero pivot may not be encountered); but the solution vector may have wildly large components whose algebraic cancellation, A , may give a very poor approximation to the when multiplying by the matrix right-hand vector the . In such cases, the solution vector x obtained by zeroing b

88 64 Solution of Linear Algebraic Equations Chapter 2. small w ’ s and then using equation (2.6.7) is very often better (in the sense of the j and the direct-method solution both the SVD residual | A · x − b | being smaller) than solution where the small w ’ s are left nonzero. j It may seem paradoxical that this can be so, since zeroing a singular value corresponds to throwing away one linear combination of the set of equations that we are trying to solve. The resolution of the paradox is that we are throwing away readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin precisely a combination of equations that is so corrupted by roundoff error as to be at best useless; usually it is worse than useless since it “ pulls ” the solution vector way off towards in fi nity along some direction that is almost a nullspace vector. In doing | larger. this, it compounds the roundoff problem and makes the residual | A · x − b SVD cannot be applied blindly, then. You have to exercise some discretion in ’ s, and/or you have to have some idea w deciding at what threshold to zero the small j what size of computed residual | A · x − b | is acceptable. As an example, here is a ” “ backsubstitution routine svbksb for evaluating equation (2.6.7) and obtaining a solution vector x from a right-hand side b ,given that the SVD of a matrix A has already been calculated by a call to svdcmp . Note ’ s. It does not that this routine presumes that you have already zeroed the small w j s, then this routine is just as ’ w do this for you. If you haven’t zeroed the small j ill-conditioned as any direct method, and you are misusing SVD. #include "nrutil.h" void svbksb(float **u, float w[], float **v, int m, int n, float b[], float x[]) A · X = B for a vector X ,where A is specified by the arrays Solves , u[1..m][1..n] , w[1..n] v[1..n][1..n] as returned by svdcmp . m and n are the dimensions of a , and will be equal for square matrices. b[1..m] is the input right-hand side. x[1..n] is the output solution vector. No input quantities are destroyed, so the routine may be called sequentially with different b ’s. { int jj,j,i; float s,*tmp; tmp=vector(1,n); T for (j=1;j<=n;j++) { Calculate U . B s=0.0; is nonzero. Nonzero result only if w if (w[j]) { j for (i=1;i<=m;i++) s += u[i][j]*b[i]; s /= w[j]; This is the divide by w . j } tmp[j]=s; } for (j=1;j<=n;j++) { Matrix multiply by V to get answer. s=0.0; for (jj=1;jj<=n;jj++) s += v[j][jj]*tmp[jj]; x[j]=s; g of machine- } isit website free_vector(tmp,1,n); ica). } svdcmp and svbksb super fi cially resembles the Note that a typical use of typical use of ludcmp and lubksb : In both cases, you decompose the left-hand matrix just once, and then can use the decomposition either once or many times A with different right-hand sides. The crucial difference is the “ editing ” of the singular values before svbksb is called:

89 2.6 Singular Value Decomposition 65 #define N ... float wmax,wmin,**a,**u,*w,**v,*b,*x; int i,j; ... u if you don’t want it to be de- for(i=1;i<=N;i++) Copy a into stroyed. for j=1;j<=N;j++) u[i][j]=a[i][j]; svdcmp(u,N,N,w,v); SVD the square matrix a . http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v wmax=0.0; Will be the maximum singular value obtained. for(j=1;j<=N;j++) if (w[j] > wmax) wmax=w[j]; This is where we set the threshold for singular values allowed to be nonzero. The constant is typical, but not universal. You have to experiment with your own application. wmin=wmax*1.0e-6; for(j=1;j<=N;j++) if (w[j] < wmin) w[j]=0.0; svbksb(u,w,v,N,N,b,x); Now we can backsubstitute. SVD for Fewer Equations than Unknowns , then you are not than unknowns If you have fewer linear equations M N − M dimensional family N expecting a unique solution. Usually there will be an of solutions. If you want to fi nd this whole solution space, then SVD can readily do the job. s, since ’ N − M zero or negligible w The SVD decomposition will yield j ’ s from any degeneracies in your M M

90 66 Solution of Linear Algebraic Equations Chapter 2. given by (2.6.7), which, with nonsquare matrices, looks like this,                                       T ) diag( x /w 1 · · = · b V j Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer U                   ( 2.6.12 ) In general, the matrix will not be singular, and no w W ’ s will need to be j set to zero. Occasionally, however, there might be column degeneracies in A .In this case you will need to zero some small w values after all. The corresponding j column in V gives the linear combination of x ’ s that is then ill-determined even by the supposedly overdetermined set. ’ s for computational Sometimes, although you do not need to zero any w j reasons, you may nevertheless want to take note of any that are unusually small: Their corresponding columns in are linear combinations of x ’ s which are insensitive V ’ s, to reduce the number of to your data. In fact, you may then wish to zero these w j fi t. These matters are discussed more fully in Chapter 15. free parameters in the Constructing an Orthonormal Basis Suppose that you have N vectors in an M -dimensional vector space, with N ≤ M . Then the N vectors span some subspace of the full vector space. Often you want to construct an orthonormal set of vectors that span the same N subspace. The textbook way to do this is by Gram-Schmidt orthogonalization, starting with one vector and then expanding the subspace one dimension at a time. Numerically, however, because of the build-up of roundoff errors, naive Gram-Schmidt orthogonalization is terrible . The right way to construct an orthonormal basis for a subspace is by SVD: Form an M × N matrix A whose N columns are your vectors. Run the matrix (which in fact replaces on output A through svdcmp . The columns of the matrix U svdcmp ) are your desired orthonormal basis vectors. from ’ s for zero values. If any occur, You might also want to check the output w j N then the spanned subspace was not, in fact, U dimensional; the columns of ’ s should be discarded from the orthonormal basis set. corresponding to zero w j 2.10, also constructs an orthonormal basis, (QR factorization, discussed in § g of machine- isit website [5] .) see ica). Approximation of Matrices Note that equation (2.6.1) can be rewritten to express any matrix A as a sum ij T weighting factors “ ” , with the V U and rows of of outer products of columns of being the singular values w , j N ∑ A ) = 2.6.13 ( w V U ik k jk ij =1 k

91 2.6 Singular Value Decomposition 67 If you ever encounter a situation where most of the singular values w of a j matrix A are very small, then A will be well-approximated by only a few terms in the sum (2.6.13). This means that you have to store only a few columns of and V (the U same k ones) and you will be able to recover, with good accuracy, the whole matrix. Note also that it is very ef fi cient to multiply such an approximated matrix by a V with each of the stored columns of x : You just dot vector x , multiply the resulting http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v , and accumulate that multiple of the corresponding w scalar by the corresponding k column of U . If your matrix is approximated by a small number K of singular multiplications, values, then this computation of ) A · x takes only about K ( M + N instead of MN for the full matrix. SVD Algorithm Here is the algorithm for constructing the singular value decomposition of any [4-5] , for discussion relating to the underlying 11.3, and also matrix. See § 11.2 – § method. #include #include "nrutil.h" void svdcmp(float **a, int m, int n, float w[], float **v) Given a matrix , this routine computes its singular value decomposition, A = a[1..m][1..n] T · U · V W .Thematrix U replaces a on output. The diagonal matrix of singular values W is out- T ) is output as w[1..n] .Thematrix V (not the transpose . V v[1..n][1..n] put as a vector { float pythag(float a, float b); int flag,i,its,j,jj,k,l,nm; float anorm,c,f,g,h,s,scale,x,y,z,*rv1; rv1=vector(1,n); g=scale=anorm=0.0; Householder reduction to bidiagonal form. for (i=1;i<=n;i++) { l=i+1; rv1[i]=scale*g; g=s=scale=0.0; if (i <= m) { for (k=i;k<=m;k++) scale += fabs(a[k][i]); if (scale) { for (k=i;k<=m;k++) { a[k][i] /= scale; s += a[k][i]*a[k][i]; } f=a[i][i]; g = -SIGN(sqrt(s),f); h=f*g-s; a[i][i]=f-g; g of machine- for (j=l;j<=n;j++) { isit website for (s=0.0,k=i;k<=m;k++) s += a[k][i]*a[k][j]; ica). f=s/h; for (k=i;k<=m;k++) a[k][j] += f*a[k][i]; } for (k=i;k<=m;k++) a[k][i] *= scale; } } w[i]=scale *g; g=s=scale=0.0; if (i <= m && i != n) { for (k=l;k<=n;k++) scale += fabs(a[i][k]); if (scale) {

92 68 Solution of Linear Algebraic Equations Chapter 2. for (k=l;k<=n;k++) { a[i][k] /= scale; s += a[i][k]*a[i][k]; } f=a[i][l]; g = -SIGN(sqrt(s),f); h=f*g-s; a[i][l]=f-g; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. for (k=l;k<=n;k++) rv1[k]=a[i][k]/h; for (j=l;j<=m;j++) { for (s=0.0,k=l;k<=n;k++) s += a[j][k]*a[i][k]; for (k=l;k<=n;k++) a[j][k] += s*rv1[k]; } for (k=l;k<=n;k++) a[i][k] *= scale; } } anorm=FMAX(anorm,(fabs(w[i])+fabs(rv1[i]))); } for (i=n;i>=1;i--) { Accumulation of right-hand transformations. if (i < n) { if (g) { for (j=l;j<=n;j++) Double division to avoid possible underflow. v[j][i]=(a[i][j]/a[i][l])/g; for (j=l;j<=n;j++) { for (s=0.0,k=l;k<=n;k++) s += a[i][k]*v[k][j]; for (k=l;k<=n;k++) v[k][j] += s*v[k][i]; } } for (j=l;j<=n;j++) v[i][j]=v[j][i]=0.0; } v[i][i]=1.0; g=rv1[i]; l=i; } Accumulation of left-hand transformations. for (i=IMIN(m,n);i>=1;i--) { l=i+1; g=w[i]; for (j=l;j<=n;j++) a[i][j]=0.0; if (g) { g=1.0/g; for (j=l;j<=n;j++) { for (s=0.0,k=l;k<=m;k++) s += a[k][i]*a[k][j]; f=(s/a[i][i])*g; for (k=i;k<=m;k++) a[k][j] += f*a[k][i]; } for (j=i;j<=m;j++) a[j][i] *= g; } else for (j=i;j<=m;j++) a[j][i]=0.0; ++a[i][i]; } for (k=n;k>=1;k--) { Diagonalization of the bidiagonal form: Loop over singular values, and over allowed iterations. for (its=1;its<=30;its++) { g of machine- flag=1; isit website for (l=k;l>=1;l--) { Test for splitting. ica). nm=l-1; rv1[1] is always zero. Note that if ((float)(fabs(rv1[l])+anorm) == anorm) { flag=0; break; } if ((float)(fabs(w[nm])+anorm) == anorm) break; } if (flag) { c=0.0; Cancellation of rv1[l] ,if l > 1 . s=1.0; for (i=l;i<=k;i++) {

93 2.6 Singular Value Decomposition 69 f=s*rv1[i]; rv1[i]=c*rv1[i]; if ((float)(fabs(f)+anorm) == anorm) break; g=w[i]; h=pythag(f,g); w[i]=h; h=1.0/h; c=g*h; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) s = -f*h; for (j=1;j<=m;j++) { y=a[j][nm]; z=a[j][i]; a[j][nm]=y*c+z*s; a[j][i]=z*c-y*s; } } } z=w[k]; if (l == k) { Convergence. if (z < 0.0) { Singular value is made nonnegative. w[k] = -z; for (j=1;j<=n;j++) v[j][k] = -v[j][k]; } break; } if (its == 30) nrerror("no convergence in 30 svdcmp iterations"); Shift from bottom 2-by-2 minor. x=w[l]; nm=k-1; y=w[nm]; g=rv1[nm]; h=rv1[k]; f=((y-z)*(y+z)+(g-h)*(g+h))/(2.0*h*y); g=pythag(f,1.0); f=((x-z)*(x+z)+h*((y/(f+SIGN(g,f)))-h))/x; c=s=1.0; Next QR transformation: for (j=l;j<=nm;j++) { i=j+1; g=rv1[i]; y=w[i]; h=s*g; g=c*g; z=pythag(f,h); rv1[j]=z; c=f/z; s=h/z; f=x*c+g*s; g = g*c-x*s; h=y*s; y*=c; for (jj=1;jj<=n;jj++) { x=v[jj][j]; g of machine- z=v[jj][i]; isit website v[jj][j]=x*c+z*s; ica). v[jj][i]=z*c-x*s; } z=pythag(f,h); w[j]=z; Rotation can be arbitrary if z =0 . if (z) { z=1.0/z; c=f*z; s=h*z; } f=c*g+s*y; x=c*y-s*g;

94 70 Solution of Linear Algebraic Equations Chapter 2. for (jj=1;jj<=m;jj++) { y=a[jj][j]; z=a[jj][i]; a[jj][j]=y*c+z*s; a[jj][i]=z*c-y*s; } } rv1[l]=0.0; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer rv1[k]=f; w[k]=x; } } free_vector(rv1,1,n); } #include #include "nrutil.h" float pythag(float a, float b) 2 / 1 2 2 a ( Computes b ) without destructive underflow or overflow. + { float absa,absb; absa=fabs(a); absb=fabs(b); if (absa > absb) return absa*sqrt(1.0+SQR(absb/absa)); else return (absb == 0.0 ? 0.0 : absb*sqrt(1.0+SQR(absa/absb))); } dsvdcmp , , named (Double precision versions of svdcmp , svbksb , and pythag dpythag , are used by the routine ratlsq in § 5.13. You can easily , and dsvbksb Numerical Recipes make the conversions, or else get the converted routines from the diskette.) CITED REFERENCES AND FURTHER READING: Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins University Press), § 8.3 and Chapter 12. Lawson, C.L., and Hanson, R. 1974, Solving Least Squares Problems (Englewood Cliffs, NJ: Prentice-Hall), Chapter 18. Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical (Englewood Cliffs, NJ: Prentice-Hall), Chapter 9. [1] Computations , vol. II of Wilkinson, J.H., and Reinsch, C. 1971, Linear Algebra Handbook for Automatic Com- putation (New York: Springer-Verlag), Chapter I.10 by G.H. Golub and C. Reinsch. [2] (Philadelphia: S.I.A.M.), Chapter 11. [3] Dongarra, J.J., et al. 1979, LINPACK User’s Guide g of machine- isit website Smith, B.T., et al. 1976, Matrix Eigensystem Routines — EISPACK Guide , 2nd ed., vol. 6 of ica). Lecture Notes in Computer Science (New York: Springer-Verlag). Introduction to Numerical Analysis (New York: Springer-Verlag), Stoer, J., and Bulirsch, R. 1980, § 6.7. [4] Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins § 5.2.6. [5] University Press),

95 2.7 Sparse Linear Systems 71 2.7 Sparse Linear Systems if only a relatively small number A system of linear equations is called sparse of its matrix elements a are nonzero. It is wasteful to use general methods of ij 3 N ( O linear algebra on such problems, because most of the ) arithmetic operations Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer devoted to solving the set of equations or inverting the matrix involve zero operands. Furthermore, you might wish to work problems so large as to tax your available memory space, and it is wasteful to reserve storage for unfruitful zero elements. Note that there are two distinct (and not always compatible) goals for any sparse matrix method: saving time and/or saving space. We have already considered one archetypal sparse form in § 2.4, the band diagonal matrix. In the tridiagonal case, e.g., we saw that it was possible to save 2 3 ). N The ) and space (order N instead of N N both time (order instead of method of solution was not different in principle from the general method of LU decomposition; it was just applied cleverly, and with due attention to the bookkeeping of zero elements. Many practical schemes for dealing with sparse problems have this same character. They are fundamentally decomposition schemes, or else elimination schemes akin to Gauss-Jordan, but carefully optimized so as to minimize the number of so-called , initially zero elements which must become nonzero during the fill-ins solution process, and for which storage must be reserved. Direct methods for solving sparse equations, then, depend crucially on the precise pattern of sparsity of the matrix. Patterns that occur frequently, or that are useful as way-stations in the reduction of more general forms, already have special names and special methods of solution. We do not have space here for any detailed review of these. References listed at the end of this section will furnish you with an “in” to the specialized literature, and the following list of buzz words (and Figure 2.7.1) will at least let you hold your own at cocktail parties: • tridiagonal • band diagonal (or banded) with bandwidth M band triangular • • block diagonal • block tridiagonal • block triangular • cyclic banded • singly (or doubly) bordered block diagonal • singly (or doubly) bordered block triangular • singly (or doubly) bordered band diagonal • singly (or doubly) bordered band triangular g of machine- isit website • other (!) ica). You should also be aware of some of the special sparse forms that occur in the solution of partial differential equations in two or more dimensions. See Chapter 19. If your particular pattern of sparsity is not a simple one, then you may wish to try an analyze/factorize/operate package, which automates the procedure of figuring out how fill-ins are to be minimized. The analyze stage is done once only for each pattern of sparsity. The factorize stage is done once for each particular matrix that fits the pattern. The operate stage is performed once for each right-hand side to

96 72 Solution of Linear Algebraic Equations Chapter 2. zeros Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer zeros zeros (a) (c) ( b) (d) (e) (f ) (g) ( h) (i) (k) (j) g of machine- Figure 2.7.1. Some standard forms for sparse matrices. (a) Band diagonal; (b) block triangular; (c) block isit website tridiagonal; (d) singly bordered block diagonal; (e) doubly bordered block diagonal; (f) singly bordered ica). block triangular; (g) bordered band-triangular; (h) and (i) singly and doubly bordered band diagonal; (j) [1] and (k) other! (after Tewarson) . [2,3] be used with the particular matrix. Consult for references on this. The NAG [4] library has an analyze/factorize/operate capability. A substantial collection of [5] as the Yale routines for sparse matrix calculation is also available from IMSL [6] Sparse Matrix Package . You should be aware that the special order of interchanges and eliminations,

97 2.7 Sparse Linear Systems 73 prescribed by a sparse matrix method so as to minimize fi ll-ins and arithmetic s numerical stability as compared operations, generally acts to decrease the method ’ decomposition with pivoting. Scaling your problem so as to LU to, e.g., regular make its nonzero matrix elements have comparable magnitudes (if you can do it) will sometimes ameliorate this problem. In the remainder of this section, we present some concepts which are applicable http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin to some general classes of sparse matrices, and which do not necessarily depend on details of the pattern of sparsity. Sherman-Morrison Formula Suppose that you have already obtained, by herculean effort, the inverse matrix − 1 change in , for of a square matrix A . Now you want to make a “ small ” A A a example change one element , or a few elements, or one row, or one column. ij − 1 without repeating A Is there any way of calculating the corresponding change in your dif fi cult labors? Yes, if your change is of the form ( A → ) A + u ⊗ v )( 2.7.1 , then (2.7.1) adds the components for some vectors u and v .If u is a unit vector e i ⊗ of i th row. (Recall that u to the v is a matrix whose i,j th element is the product v of the i th component of u and the j th component of v .) If v is a unit vector e , then j th column. If both (2.7.1) adds the components of u to the j v u and are proportional a and e respectively, then a term is added only to the element . to unit vectors e ij i j − 1 , and is derived ( A + u ⊗ v ) The Sherman-Morrison formula gives the inverse brie fl y as follows: 1 − − 1 − − 1 1 ⊗ u A · =( 1 + A ) v · ) + A ( v u ⊗ − 1 − 1 − 1 1 − u ⊗ v + A 1 − · u ⊗ v · A A − · u ⊗ v =( ... ) · A · − 1 − 1 − 1 2 · · u ⊗ v A = A A (1 − λ + λ − − ... ) − 1 − 1 · v ( ⊗ ) u ) A · A ( 1 − = A − 1+ λ ) ( 2.7.2 where − 1 ) 2.7.3 ( u · ≡ λ A · v g of machine- The second line of (2.7.2) is a formal power series expansion. In the third line, the isit website λ . associativity of outer and inner products is used to factor out the scalars ica). − 1 and the vectors u and v , we need only A The use of (2.7.2) is this: Given perform two matrix multiplications and a vector dot product, − 1 − 1 T z A ≡ ( A z · ) uw · v λ = v · ≡ ( 2.7.4 ) to get the desired change in the inverse w z ⊗ − 1 − 1 ( ) 2.7.5 A → − A 1+ λ

98 74 Solution of Linear Algebraic Equations Chapter 2. 2 N multiplies and a like number of adds (an 3 The whole procedure requires only even smaller number if u or v is a unit vector). The Sherman-Morrison formula can be directly applied to a class of sparse A problems. If you already have a fast way of calculating the inverse of (e.g., a tridiagonal matrix, or some other standard sparse form), then (2.7.4) – (2.7.5) allow you to build up to your related but more complicated form, adding for example a readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) row or column at a time. Notice that you can apply the Sherman-Morrison formula − 1 more than once successively, using at each stage the most recent update of A (equation 2.7.5). Of course, if you have to modify every row, then you are back to 3 3 an N method. The constant in front of the is only a few times worse than the N better direct methods, but you have deprived yourself of the stabilizing advantages of pivoting so be careful. — For some other sparse problems, the Sherman-Morrison formula cannot be − 1 directly applied for the simple reason that storage of the whole inverse matrix A u ⊗ v , is not feasible. If you want to add only a single correction of the form and solve the linear system x + u ⊗ v ) · ( = b ( 2.7.6 ) A then you proceed as follows. Using the fast method that is presumed available for , solve the two auxiliary problems the matrix A ) A bA · z = u ( 2.7.7 = · y and z . In terms of these, y for the vectors [ ] · y v − = x y z ( 2.7.8 ) · z v 1+( ) as we see by multiplying (2.7.2) on the right by b . Cyclic Tridiagonal Systems So-called cyclic tridiagonal systems occur quite frequently, and are a good example of how to use the Sherman-Morrison formula in the manner just described. The equations have the form       x r b c 0 ··· β 1 1 1 1 c ··· r b x a       2 2 2 2 2       ( = ) · 2.7.9 ··· ··· ···             c b a ··· r x − − 1 − N 1 N − 1 1 − N 1 N N b α r ··· 0 x a N N N N and in the corners. α This is a tridiagonal system, except for the matrix elements β g of machine- isit website fi nite-differencing differential equations Forms like this are typically generated by ica). 19.4). § with periodic boundary conditions ( We use the Sherman-Morrison formula, treating the system as tridiagonal plus ne vectors u and a correction. In the notation of equation (2.7.6), de to be fi v     γ 1     0 0     . .     . . = u = v ( ) 2.7.10 . .         0 0 α β/γ

99 2.7 Sparse Linear Systems 75 Here γ is arbitrary for the moment. Then the matrix A is the tridiagonal part of the matrix in (2.7.9), with two terms modi ed: fi ′ ′ b b ) αβ/γ = 2.7.11 − − γ,b ( = b 1 N N 1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. We now solve equations (2.7.7) with the standard tridiagonal algorithm, and then get the solution from equation (2.7.8). The routine cyclic below implements this algorithm. We choose the arbitrary rst of equations fi to avoid loss of precision by subtraction in the − b = γ parameter 1 (2.7.11). In the unlikely event that this causes loss of precision in the second of these equations, you can make a different choice. #include "nrutil.h" void cyclic(float a[], float b[], float c[], float alpha, float beta, float r[], float x[], unsigned long n) x[1..n] the “cyclic” set of linear equations given by equation (2.7.9). a , Solves for a vector b , c ,and r are input vectors, all dimensioned as [1..n] , while alpha and beta are the corner entries in the matrix. The input is not modified. { void tridag(float a[], float b[], float c[], float r[], float u[], unsigned long n); unsigned long i; float fact,gamma,*bb,*u,*z; if (n <= 2) nrerror("n too small in cyclic"); bb=vector(1,n); u=vector(1,n); z=vector(1,n); gamma = -b[1]; Avoid subtraction error in forming bb[1] . bb[1]=b[1]-gamma; Set up the diagonal of the modified tridi- agonal system. bb[n]=b[n]-alpha*beta/gamma; for (i=2;i

100 76 Chapter 2. Solution of Linear Algebraic Equations matrices with Here A is, as usual, an N × P

101 2.7 Sparse Linear Systems 77 Finally, solve the one further auxiliary problem 2.7.20 ( b A · y = ) In terms of these quantities, the solution is given by ] [ T ( 2.7.21 ) · y ) · x H · ( V = y − Z Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Inversion by Partitioning Once in a while, you will encounter a matrix (not even necessarily sparse) that can be inverted ef fi ciently by partitioning. Suppose that the N × N matrix A is partitioned into [ ] PQ = A ( 2.7.22 ) RS where P and S are square matrices of size p × p and s × s respectively ( p + s = N ). s The matrices and R are not necessarily square, and have sizes p × Q and s × p , respectively. is partitioned in the same manner, A If the inverse of ] [ ̃ ̃ P Q 1 − ( 2.7.23 ) = A ̃ ̃ S R ̃ ̃ ̃ ̃ P , then R , Q S , which have the same sizes as P , Q , R , S , respectively, can be , found by either the formulas − 1 1 − ̃ − Q S · P =( P ) · R 1 − 1 − − 1 ̃ · S − − P ( Q Q = R · ( Q · S ) · ) 2.7.24 ( ) 1 1 − − − 1 ̃ = S ( − R ( P − Q · S · R · R ) ) · − − 1 1 − − 1 1 1 − ̃ = S S − Q · S · ) · R ) ( · · ( Q · S S +( ) P R or else by the equivalent formulas − − 1 − 1 1 1 − 1 − ̃ P = P · ( S − R · P P · · Q ) Q +( · ( R · P ) ) 1 − 1 − − 1 ̃ ( − = Q P Q ) · ( S − R · P ) · · Q g of machine- isit website ( 2.7.25 ) 1 − 1 − 1 − ̃ ica). S P · R ( − − R = · Q ) P ) · · ( R 1 − − 1 ̃ S =( S − R · P ) · Q The parentheses in equations (2.7.24) and (2.7.25) highlight repeated factors that you may wish to compute only once. (Of course, by associativity, you can instead do the matrix multiplications in any order you like.) The choice between using ̃ ̃ equation (2.7.24) and (2.7.25) depends on whether you want or S to have the P 1 − 1 − Q · is easier ) P · R − S simpler formula; or on whether the repeated expression (

102 78 Chapter 2. Solution of Linear Algebraic Equations − 1 − 1 ) S ; or on the relative sizes of − · R P Q · ( P to calculate than the expression − 1 1 − and S ; or on whether P S or is already known. Another sometimes useful formula is for the determinant of the partitioned matrix, − 1 − 1 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v − · Q )=det S det( P · Q S · R )( 2.7.26 ) S P =det det( − · P R det A Indexed Storage of Sparse Matrices We have already seen ( § 2.4) that tri- or band-diagonal matrices can be stored in a compact format that allocates storage only to elements which can be nonzero, plus perhaps a few wasted locations to make the bookkeeping easier. What about more general sparse matrices? When a sparse matrix of dimension × N contains only a few times N nonzero elements (a typical N fi cient and often physically impossible — to allocate storage for all case), it is surely inef — 2 N elements. Even if one did allocate such storage, it would be inef cient or prohibitive in fi machine time to loop over all of it in search of nonzero elements. Obviously some kind of indexed storage scheme is required, one that stores only nonzero cient auxiliary information to determine where an element matrix elements, along with suf fi logically belongs and how the various elements can be looped over in common matrix [7] operations. Unfortunately, there is no one standard scheme in general use. Knuth describes [6] [8] one method. The Yale Sparse Matrix Package and ITPACK describe several other [9] storage scheme used by PCGPACK methods. For most applications, we favor the , which [10] is almost the same as that described by Bentley , and also similar to one of the Yale Sparse Matrix Package methods. The advantage of this scheme, which can be called row-indexed mode , is that it requires storage of only about two times the number of nonzero sparse sto rage fi ve times.) For simplicity, matrix elements. (Other methods can require as much as three or we will treat only the case of square matrices, which occurs most frequently in practice. A of dimension To represent a matrix × N , the row-indexed scheme sets up two N one-dimensional arrays, call them sa and ija . The fi rst of these stores matrix element values in single or double precision as desired; the second stores integer values. The storage rules are: A sa ’ • The fi rst N locations of s diagonal matrix elements, in order. (Note that store diagonal elements are stored even if they are zero; this is at most a slight storage ciency, since diagonal elements are nonzero in most realistic applications.) fi inef Each of the fi rst N locations of ija stores the index of the array sa that contains • fi the off-diagonal element of the corresponding row of the matrix. (If there are rst no off-diagonal elements for that row, it is one greater than the index in sa of the most recently stored element of a previous row.) • Location 1 of ija is always equal to N +2 . (It can be read to determine N .) • of the last off-diagonal Location N +1 of ija is one greater than the index in sa element of the last row. (It can be read to determine the number of nonzero .) elements in the matrix, or the number of elements in the arrays sa and ija is not used and can be set arbitrarily. Location N +1 of sa g of machine- isit website +2 • sa at locations ≥ N Entries in contain A ’ s off-diagonal values, ordered by rows and, within each row, ordered by columns. ica). • Entries in ija at locations ≥ N +2 contain the column number of the corresponding element in sa . While these rules seem arbitrary at fi rst sight, they result in a rather elegant storage scheme. As an example, consider the matrix   . . 3 . 0 . 1 0 0 .   0 . 4 . 0 . 0 . 0 .   . 7 . 5 . 9 . 0 . 0 ) 2.7.27 (     2 . . . 0 . 0 0 . 0 5 . . 6 . . 0 . 0 0

103 2.7 Sparse Linear Systems 79 In row-indexed compact storage, matrix (2.7.27) is represented by the two arrays of length 11, as follows 8 9 11 10 index k 1 2 3 4 5 6 7 12 10 7 8 8 11 ija[k] 3 2 4 5 4 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) sa[k] 3 . 4 . 5 . 0 . 5 . x 1 . 7 . 9 . 2 . 6 . ( 2.7.28 ) x is an arbitrary value. Notice that, according to the storage rules, the value of N Here ija[1]-2 , and the length of each array is ija[ija[1]-1]-1 , namely 11. (namely 5) is The diagonal element in row i is sa[i] , and the off-diagonal elements in that row are in sa[k] where k loops from ija[i] to ija[i+1]-1 , if the upper limit is greater or equal to s loops). for the lower one (as in C ’ Here is a routine, , that converts a matrix from full storage mode into row-indexed sprsin sparse storage mode, throwing away any elements that are less than a speci fi ed threshold. Of course, the principal use of sparse storage mode is for matrices whose full storage mode ’ t fi t into your machine at all; then you have to generate them directly into sparse format. won sprsin is useful as a precise algorithmic de fi Nevertheless nition of the storage scheme, for subscale testing of large problems, and for the case where execution time, rather than storage, furnishes the impetus to sparse storage. #include void sprsin(float **a, int n, float thresh, unsigned long nmax, float sa[], unsigned long ija[]) Converts a square matrix a[1..n][1..n] into row-indexed sparse storage mode. Only ele- a with magnitude ≥ thresh are retained. Output is in two linear arrays with dimen- ments of sion sa[1..] nmax (an input parameter): contains array values, indexed by ija[1..] .The number of elements filled of on output are both (see text). sa and ija ija[ija[1]-1]-1 { void nrerror(char error_text[]); int i,j; unsigned long k; for (j=1;j<=n;j++) sa[j]=a[j][j]; Store diagonal elements. Index to 1st ro woff-diagonal element, if any. ija[1]=n+2; k=n+1; for (i=1;i<=n;i++) { Loop over rows. for (j=1;j<=n;j++) { Loop over columns. if (fabs(a[i][j]) >= thresh && i != j) { if (++k > nmax) nrerror("sprsin: nmax too small"); sa[k]=a[i][j]; Store off-diagonal elements and their columns. ija[k]=j; } } ija[i+1]=k+1; As each ro wis completed, store index to } next. g of machine- } isit website ica). The single most important use of a matrix in row-indexed sparse storage mode is to multiply a vector to its right. In fact, the storage mode is optimized for just this purpose. The following routine is thus very simple. void sprsax(float sa[], unsigned long ija[], float x[], float b[], unsigned long n) , giving sa and ija by a vector x[1..n] Multiply a matrix in row-index sparse storage arrays a vector b[1..n] . { void nrerror(char error_text[]);

104 80 Chapter 2. Solution of Linear Algebraic Equations unsigned long i,k; if (ija[1] != n+2) nrerror("sprsax: mismatched vector and matrix"); for (i=1;i<=n;i++) { b[i]=sa[i]*x[i]; Start with diagonal term. for (k=ija[i];k<=ija[i+1]-1;k++) Loop over off-diagonal terms. b[i] += sa[k]*x[ija[k]]; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin } } It is also simple to multiply the transpose of a matrix by a vector to its right. (We will use this operation later in this section.) Note that the transpose matrix is not actually constructed. void sprstx(float sa[], unsigned long ija[], float x[], float b[], unsigned long n) sa and ija by a vector Multiply the transpose of a matrix in row-index sparse storage arrays x[1..n] b[1..n] . , giving a vector { void nrerror(char error_text[]); unsigned long i,j,k; if (ija[1] != n+2) nrerror("mismatched vector and matrix in sprstx"); Start with diagonal terms. for (i=1;i<=n;i++) b[i]=sa[i]*x[i]; Loop over off-diagonal terms. for (i=1;i<=n;i++) { for (k=ija[i];k<=ija[i+1]-1;k++) { j=ija[k]; b[j] += sa[k]*x[i]; } } } , named dsprstx , are used (Double precision versions of sprsax and sprstx and dsprsax by the routine later in this section. You can easily make the conversion, or else get atimes the converted routines from the Numerical Recipes diskettes.) In fact, because the choice of row-indexed storage treats rows and columns quite differently, it is quite an involved operation to construct the transpose of a matrix, given the matrix itself in row-indexed sparse storage mode. When the operation cannot be avoided, it is done as follows: An index of all off-diagonal elements by their columns is constructed (see § 8.4). The elements are then written to the output array in column order. As each element is written, its row is determined and stored. Finally, the elements in each column are sorted by row. void sprstp(float sa[], unsigned long ija[], float sb[], unsigned long ijb[]) Construct the transpose of a sparse square matrix, from row-index sparse storage arrays sa and ijb . ija into arrays sb and { void iindexx(unsigned long n, long arr[], unsigned long indx[]); g of machine- with all Version of indexx float variables changed to long . isit website unsigned long j,jl,jm,jp,ju,k,m,n2,noff,inc,iv; ica). float v; Linear size of matrix plus 2. n2=ija[1]; for (j=1;j<=n2-2;j++) sb[j]=sa[j]; Diagonal elements. iindexx(ija[n2-1]-ija[1],(long *)&ija[n2-1],&ijb[n2-1]); Index all off-diagonal elements by their columns. jp=0; for (k=ija[1];k<=ija[n2-1]-1;k++) { Loop over output off-diagonal elements. m=ijb[k]+n2-1; Use index table to store by (former) columns. sb[k]=sa[m]; for (j=jp+1;j<=ija[m];j++) ijb[j]=k; Fill in the index to any omitted rows.

105 2.7 Sparse Linear Systems 81 jp=ija[m]; Use bisection to find which row element m is in and put that into ijb[k] . jl=1; ju=n2-1; while (ju-jl > 1) { jm=(ju+jl)/2; if (ija[jm] > m) ju=jm; else jl=jm; } ijb[k]=jl; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin } for (j=jp+1;j iv) { ijb[m]=ijb[m-inc]; sb[m]=sb[m-inc]; m -= inc; if (m-noff <= inc) break; } ijb[m]=iv; sb[m]=v; } } while (inc > 1); } } The above routine embeds internally a sorting algorithm from § 8.1, but calls the external iindexx to construct the initial column index. This routine is identical to indexx , routine 8.4, except that the latter long . as listed in § declarations should be changed to ’ stwo float .) In fact, you can diskettes include both indexx and iindexx Numerical Recipes (The often use indexx without making these changes, since many computers have the property that numerical values will sort correctly independently of whether they are interpreted as fl oating or integer values. As fi nal examples of the manipulation of sparse matrices, we give two routines for the multiplication of two sparse matrices. These are useful for techniques to be described in § 13.10. In general, the product of two sparse matrices is not itself sparse. One therefore wants g of machine- to limit the size of the product matrix in one of two ways: either compute only those elements isit website of the product that are speci fi ed in advance by a known pattern of sparsity, or else compute all ica). nonzero elements, but store only those whose magnitude exceeds some threshold value. The former technique, when it can be used, is quite ef fi cient. The pattern of sparsity is speci fi ed by furnishing an index array in row-index sparse storage format (e.g., ). The program ija then constructs a corresponding value array (e.g., sa ). The latter technique runs the danger of excessive compute times and unknown output sizes, so it must be used cautiously. With row-index storage, it is much more natural to multiply a matrix (on the left) by transpose of a matrix (on the right), so that one is crunching rows on rows, rather than the T A · B rows on columns. Our routines therefore calculate , rather than A · B . This means that you have to run your right-hand matrix through the transpose routine sprstp before sending it to the matrix multiply routine.

106 82 Chapter 2. Solution of Linear Algebraic Equations for threshold sprstm and ” pattern multiply The two implementing routines, “ for “ sprspm multiply ” are quite similar in structure. Both are complicated by the logic of the various combinations of diagonal or off-diagonal elements for the two input streams and output stream. void sprspm(float sa[], unsigned long ija[], float sb[], unsigned long ijb[], float sc[], unsigned long ijc[]) T where A and B are two sparse matrices in row-index storage mode, and Matrix multiply A · B http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v T B B . Here, is the transpose of sa and ija store the matrix A ; sb and ijb store the matrix B . This routine computes only those components of the matrix product that are pre-specified by the input index array ijc give the product ijc , which is not modified. On output, the arrays sc and matrix in row-index storage mode. For sparse matrix multiplication, this routine will often be sb , ijb . , so as to construct the transpose of a known matrix into sprstp preceded by a call to { void nrerror(char error_text[]); unsigned long i,ijma,ijmb,j,m,ma,mb,mbb,mn; float sum; if (ija[1] != ijb[1] || ija[1] != ijc[1]) nrerror("sprspm: sizes do not match"); for (i=1;i<=ijc[1]-2;i++) { Loop over rows. j=m=i; Set up so that first pass through loop does the diagonal component. mn=ijc[i]; sum=sa[i]*sb[i]; for (;;) { Main loop over each component to be output. mb=ijb[j]; for (ma=ija[i];ma<=ija[i+1]-1;ma++) { Loop through elements in A ’s row. Convoluted logic, following, accounts for the various combinations of diagonal and off-diagonal elements. ijma=ija[ma]; if (ijma == j) sum += sa[ma]*sb[j]; else { while (mb < ijb[j+1]) { ijmb=ijb[mb]; if (ijmb == i) { sum += sa[i]*sb[mb++]; continue; } else if (ijmb < ijma) { mb++; continue; } else if (ijmb == ijma) { sum += sa[ma]*sb[mb++]; continue; } break; } } } for (mbb=mb;mbb<=ijb[j+1]-1;mbb++) { Exhaust the remainder of B ’s row. if (ijb[mbb] == i) sum += sa[i]*sb[mbb]; } sc[m]=sum; g of machine- isit website sum=0.0; Reset indices for next pass through loop. if (mn >= ijc[i+1]) break; ica). j=ijc[m=mn++]; } } } #include void sprstm(float sa[], unsigned long ija[], float sb[], unsigned long ijb[], float thresh, unsigned long nmax, float sc[], unsigned long ijc[])

107 2.7 Sparse Linear Systems 83 T A · B where Matrix multiply A and B are two sparse matrices in row-index storage mode, and T B sb ; ijb store the matrix and is the transpose of B . Here, sa and ija store the matrix A . This routine computes all components of the matrix product (which may be non-sparse!), B sc thresh . On output, the arrays and ijc but stores only those whose magnitude exceeds (whose maximum size is input as nmax ) give the product matrix in row-index storage mode. sprstp ,soas For sparse matrix multiplication, this routine will often be preceded by a call to to construct the transpose of a known matrix into sb , ijb . { Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin void nrerror(char error_text[]); unsigned long i,ijma,ijmb,j,k,ma,mb,mbb; float sum; if (ija[1] != ijb[1]) nrerror("sprstm: sizes do not match"); ijc[1]=k=ija[1]; for (i=1;i<=ija[1]-2;i++) { Loop over rows of A , B . and rows of for (j=1;j<=ijb[1]-2;j++) { if (i == j) sum=sa[i]*sb[j]; else sum=0.0e0; mb=ijb[j]; for (ma=ija[i];ma<=ija[i+1]-1;ma++) { Loop through elements in A ’s row. Convoluted logic, following, accounts for the various combinations of diagonal and off-diagonal elements. ijma=ija[ma]; if (ijma == j) sum += sa[ma]*sb[j]; else { while (mb < ijb[j+1]) { ijmb=ijb[mb]; if (ijmb == i) { sum += sa[i]*sb[mb++]; continue; } else if (ijmb < ijma) { mb++; continue; } else if (ijmb == ijma) { sum += sa[ma]*sb[mb++]; continue; } break; } } } ’s row. for (mbb=mb;mbb<=ijb[j+1]-1;mbb++) { Exhaust the remainder of B if (ijb[mbb] == i) sum += sa[i]*sb[mbb]; } if (i == j) sc[i]=sum; Where to put the answer... else if (fabs(sum) > thresh) { if (k > nmax) nrerror("sprstm: nmax too small"); sc[k]=sum; ijc[k++]=j; } } ijc[i+1]=k; g of machine- } isit website } ica). Conjugate Gradient Method for a Sparse System conjugate gradient methods provide a quite general means for solving the So-called N × N linear system A · x = b ( 2.7.29 ) The attractiveness of these methods for large sparse systems is that they reference A only through its multiplication of a vector, or the multiplication of its transpose and a vector. As

108 84 Solution of Linear Algebraic Equations Chapter 2. we have seen, these operations can be very ef cient for a properly stored sparse matrix. You, fi the “ owner ” of the matrix A , can be asked to provide functions that perform these sparse ” matrix multiplications as ef fi ciently as possible. We, the “ grand strategists supply the general routine, below, that solves the set of linear equations, (2.7.29), using your functions. linbcg [11-13] “ ordinary ” conjugate gradient algorithm The simplest, solves (2.7.29) only in the case that A is symmetric and positive de fi nite. It is based on the idea of minimizing the function Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 1 f )= ( x x A · x − · · x ( 2.7.30 ) b 2 This function is minimized when its gradient ∇ = A · x − b ( 2.7.31 ) f is zero, which is equivalent to (2.7.29). The minimization is carried out by generating a and improved minimizers x . At each stage a quantity α succession of search directions p k k k is found that minimizes f ( x . + α p p α ) , and x + x is set equal to the new point +1 k k k k k k k The p is also the minimizer of x and are built up in such a way that f over the whole x k +1 k k p vector space of directions already taken, { p , ,..., p } . After N iterations you arrive at 1 k 2 the minimizer over the entire vector space, i.e., the solution to (2.7.29). conjugate gradient algorithm to the § “ ordinary Later, in 10.6, we will generalize this ” minimization of arbitrary nonlinear functions. Here, where our interest is in solving linear, but not necessarily positive de fi nite or symmetric, equations, a different generalization is important, the biconjugate gradient method . This method does not, in general, have a simple connection with function minimization. It constructs four sequences of vectors, r , r , p , k k k . Then , k =1 , = ,... . You supply the initial vectors r and r p , and set p = r , p r 2 1 1 1 1 1 k 1 you carry out the following recurrence: r · r k k α = k p A · p · k k r = r p − α · A k k +1 k k T p = r · − α r A k +1 k k k 2.7.32 ) ( r · r k k +1 +1 = β k r · r k k = r p p + β k +1 k k +1 k β + = r p p k +1 k k +1 k This sequence of vectors satis fi es the biorthogonality condition r r r = r · · =0 ,j

109 2.7 Sparse Linear Systems 85 r from the while carrying out the recurrence (2.7.32). Equation (2.7.37) guarantees that +1 k recurrence is in fact the residual − A · x b corresponding to =0 r , x . Since m +1 +1 +1 k k x is the solution to equation (2.7.29). m +1 While there is no guarantee that this whole procedure will not break down or become , in practice this is rare. More importantly, the exact termination in at unstable for general A N most iterations occurs only with exact arithmetic. Roundoff error means that you should regard the process as a genuinely iterative procedure, to be halted when some appropriate http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. error criterion is met. The ordinary conjugate gradient algorithm is the special case of the biconjugate gradient A is symmetric, and we choose algorithm when for all = r p . Then r = = r r and p 1 1 k k k k ; you can omit computing them and halve the work of the algorithm. This conjugate gradient k A is positive de version has the interpretation of minimizing equation (2.7.30). If nite as fi well as symmetric, the algorithm cannot break down (in theory!). The routine linbcg below indeed reduces to the ordinary conjugate gradient method if you input a symmetric A ,but it does all the redundant computations. Another variant of the general algorithm corresponds to a symmetric but non-positive fi A , with the choice de nite and = A · r r instead of r r = r · . In this case r A = 1 1 1 1 k k p . This algorithm is thus equivalent to the ordinary conjugate gradient = A · p k for all k k · a b replaced by a · algorithm, but with all dot products · b . It is called the minimum residual A algorithm, because it corresponds to successive minimizations of the function 1 1 2 | r · r = A | · x − b Φ( x )= ( 2.7.38 ) 2 2 p over the same set of search directions generated Φ minimize where the successive iterates x k k in the conjugate gradient method. This algorithm has been generalized in various ways for [9,15] generalized minimum residual method (GMRES; see unsymmetric matrices. The )is probably the most robust of these methods. Note that equation (2.7.38) gives T )( b − A · ( ) · 2.7.39 x A x Φ( ∇ )= T A · is symmetric and positive de fi nite. You might therefore A For any nonsingular matrix , A be tempted to solve equation (2.7.29) by applying the ordinary conjugate gradient algorithm to the problem T T ( A ) · x = A · · b ( 2.7.40 ) A T · A is the square of the condition number of Don ’ t! The condition number of the matrix A A § 2.6 for de fi nition of condition number). A large condition number both increases the (see number of iterations required, and limits the accuracy to which a solution can be obtained. It is almost always better to apply the biconjugate gradient method to the original matrix A . So far we have said nothing about the rate of convergence of these methods. The ordinary conjugate gradient method works well for matrices that are well-conditioned, i.e., preconditioned to the identity matrix. This suggests applying these methods to the ” “ close form of equation (2.7.29), g of machine- isit website − 1 − 1 ̃ ̃ ( A · x = ( A · A · b ) ) 2.7.41 ica). ̃ close A The idea is that you might already be able to solve your linear system easily for some − 1 ̃ · A ≈ 1 , allowing the algorithm to converge in fewer steps. The , in which case A to A ̃ [11] matrix preconditioner is called a A , and the overall scheme given here is known as the preconditioned biconjugate gradient method or PBCG . fi cient implementation, the PBCG algorithm introduces an additional set of vectors For ef z and z de fi ned by k k T ̃ ̃ · ) = r 2.7.42 and ( A r = z A · z k k k k

110 86 Chapter 2. Solution of Linear Algebraic Equations and modi fi es the de fi nitions of α in equation (2.7.32): , β p , p , and k k k k z · r k k = α k p A · p · k k z · r +1 k +1 k = β k ) ( 2.7.43 r z · k k http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin + p = β z p k k +1 k +1 k z p = + β p k k +1 +1 k k , below, we will ask you to supply routines that solve the auxiliary linear systems For linbcg ̃ , then use the diagonal part A (2.7.42). If you have no idea what to use for the preconditioner of , or even the identity matrix, in which case the burden of convergence will be entirely A on the biconjugate gradient method itself. , below, is based on a program originally written by Anne Greenbaum. The routine linbcg [13] for a different, less sophisticated, implementation.) There are a few wrinkles you (See should know about. What constitutes “ good ” convergence is rather application dependent. The routine therefore provides for four possibilities, selected by setting the fl ag itol on input. linbcg is less than the input quantity itol=1 | A · x − b | / | b | , iteration stops when the quantity If .If , the required criterion is tol itol=2 1 1 − − ̃ ̃ A | · ( A · x − b ) | / | 2.7.44 A ) ( · b | < tol If itol=3 , and requires its magnitude, , the routine uses its own estimate of the error in x , to be less than tol x itol=4 is the same as itol=3 , divided by the magnitude of . The setting x except that the largest (in absolute value) component of the error and largest component of L are used instead of the vector magnitude (that is, the L norm). You norm instead of the ∞ 2 may need to experiment to fi nd which of these convergence criteria is best for your problem. On output, err is the tolerance actually achieved. If the returned count iter does not indicate that the maximum number of allowed iterations itmax was exceeded, then err should be less than tol . If you want to do further iterations, leave all returned quantities as they are and call the routine again. The routine loses its memory of the spanned conjugate gradient subspace between calls, however, so you should not force it to return more often N iterations. than about every Finally, note that linbcg is furnished in double precision, since it will be usually be used when N is quite large. #include #include #include "nrutil.h" #define EPS 1.0e-14 void linbcg(unsigned long n, double b[], double x[], int itol, double tol, int itmax, int *iter, double *err) = for Solves A · x b ,given b[1..n] , by the iterative biconjugate gradient method. x[1..n] On input x[1..n] itol should be set to an initial guess of the solution (or all zeros); is 1,2,3, g of machine- itmax is the maximum number or 4, specifying which convergence test is applied (see text); isit website of allowed iterations; and tol is the desired convergence tolerance. On output, x[1..n] is ica). reset to the improved solution, is the err iter is the number of iterations actually taken, and atimes , estimated error. The matrix A is referenced only through the user-supplied routines or its transpose on a vector; and which computes the product of either A , which solves asolve T ̃ ̃ ̃ x = b for some preconditioner matrix · A (possibly the trivial diagonal part of A ). or A A · x = b { void asolve(unsigned long n, double b[], double x[], int itrnsp); void atimes(unsigned long n, double x[], double r[], int itrnsp); double snrm(unsigned long n, double sx[], int itol); unsigned long j; double ak,akden,bk,bkden,bknum,bnrm,dxnrm,xnrm,zm1nrm,znrm; double *p,*pp,*r,*rr,*z,*zz; Double precision is a good idea in this routine.

111 2.7 Sparse Linear Systems 87 p=dvector(1,n); pp=dvector(1,n); r=dvector(1,n); rr=dvector(1,n); z=dvector(1,n); zz=dvector(1,n); Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Calculate initial residual. *iter=0; , output is atimes(n,x,r,0); Input to atimes is x[1..n] r[1..n] ; for (j=1;j<=n;j++) { the final 0 indicates that the matrix (not its transpose) is to be used. r[j]=b[j]-r[j]; rr[j]=r[j]; } Uncomment this line to get the “minimum resid- /* atimes(n,r,rr,0); */ ual” variant of the algorithm. if (itol == 1) { bnrm=snrm(n,b,itol); asolve Input to ; is r[1..n] , output is z[1..n] asolve(n,r,z,0); ̃ } 0 indicates that the matrix the final A (not else if (itol == 2) { itstranspose)istobeused. asolve(n,b,z,0); bnrm=snrm(n,z,itol); asolve(n,r,z,0); } else if (itol == 3 || itol == 4) { asolve(n,b,z,0); bnrm=snrm(n,z,itol); asolve(n,r,z,0); znrm=snrm(n,z,itol); } else nrerror("illegal itol in linbcg"); while (*iter <= itmax) { Main loop. ++(*iter); T ̃ indicates use of transpose matrix 1 A Final asolve(n,rr,zz,1); . for (bknum=0.0,j=1;j<=n;j++) bknum += z[j]*rr[j]; Calculate coefficient bk and direction vectors p and pp . if (*iter == 1) { for (j=1;j<=n;j++) { p[j]=z[j]; pp[j]=zz[j]; } } else { bk=bknum/bkden; for (j=1;j<=n;j++) { p[j]=bk*p[j]+z[j]; pp[j]=bk*pp[j]+zz[j]; } } , and new Calculate coefficient ak , ne witerate x bkden=bknum; residuals atimes(n,p,z,0); r and rr . g of machine- for (akden=0.0,j=1;j<=n;j++) akden += z[j]*pp[j]; isit website ak=bknum/akden; ica). atimes(n,pp,zz,1); for (j=1;j<=n;j++) { x[j] += ak*p[j]; r[j] -= ak*z[j]; rr[j] -= ak*zz[j]; } ̃ Solve A asolve(n,r,z,0); · z = r and check stopping criterion. if (itol == 1) *err=snrm(n,r,itol)/bnrm; else if (itol == 2) *err=snrm(n,z,itol)/bnrm;

112 88 Solution of Linear Algebraic Equations Chapter 2. else if (itol == 3 || itol == 4) { zm1nrm=znrm; znrm=snrm(n,z,itol); if (fabs(zm1nrm-znrm) > EPS*znrm) { dxnrm=fabs(ak)*snrm(n,p,itol); *err=znrm/fabs(zm1nrm-znrm)*dxnrm; } else { *err=znrm/bnrm; Error may not be accurate, so loop again. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin continue; } xnrm=snrm(n,x,itol); if (*err <= 0.5*xnrm) *err /= xnrm; else { *err=znrm/bnrm; Error may not be accurate, so loop again. continue; } } printf("iter=%4d err=%12.6f\n",*iter,*err); if (*err <= tol) break; } free_dvector(p,1,n); free_dvector(pp,1,n); free_dvector(r,1,n); free_dvector(rr,1,n); free_dvector(z,1,n); free_dvector(zz,1,n); } uses this short utility for computing vector norms: linbcg The routine #include double snrm(unsigned long n, double sx[], int itol) Compute one of two norms for a vector .Usedby sx[1..n] , as signaled by itol linbcg . { unsigned long i,isamax; double ans; if (itol <= 3) { ans = 0.0; for (i=1;i<=n;i++) ans += sx[i]*sx[i]; Vector magnitude norm. return sqrt(ans); } else { isamax=1; Largest component norm. for (i=1;i<=n;i++) { if (fabs(sx[i]) > fabs(sx[isamax])) isamax=i; } return fabs(sx[isamax]); } g of machine- } isit website ica). So that the speci cations for the routines atimes and asolve are clear, we list here fi simple versions that assume a matrix A stored somewhere in row-index sparse format. extern unsigned long ija[]; extern double sa[]; The matrix is stored somewhere. void atimes(unsigned long n, double x[], double r[], int itrnsp) { void dsprsax(double sa[], unsigned long ija[], double x[], double b[], unsigned long n);

113 2.7 Sparse Linear Systems 89 void dsprstx(double sa[], unsigned long ija[], double x[], double b[], unsigned long n); sprstx and versions of . sprsax These are double if (itrnsp) dsprstx(sa,ija,x,r,n); else dsprsax(sa,ija,x,r,n); } readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer extern unsigned long ija[]; extern double sa[]; The matrix is stored somewhere. void asolve(unsigned long n, double b[], double x[], int itrnsp) { unsigned long i; for(i=1;i<=n;i++) x[i]=(sa[i] != 0.0 ? b[i]/sa[i] : b[i]); ̃ A is the diagonal part of A ,storedinthefirst n elements of sa .Sincethe The matrix transpose matrix has the same diagonal, the flag itrnsp is not used. } CITED REFERENCES AND FURTHER READING: Tewarson, R.P. 1973, Sparse Matrices (New York: Academic Press). [1] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), Chapter I.3 (by J.K. Reid). [2] Computer Solution of Large Sparse Positive Definite Systems George, A., and Liu, J.W.H. 1981, (Englewood Cliffs, NJ: Prentice-Hall). [3] NAG Fortran Library (Numerical Algorithms Group, 256 Banbury Road, Oxford OX27DE, U.K.). [4] (IMSL Inc., 2500 CityWest Boulevard, Houston TX 77042). [5] IMSL Math/Library Users Manual Eisenstat, S.C., Gursky, M.C., Schultz, M.H., and Sherman, A.H. 1977, Yale Sparse Matrix Pack- age , Technical Reports 112 and 114 (Yale University Department of Computer Science). [6] Fundamental Algorithms , vol. 1 of The Art of Computer Programming (Reading, Knuth, D.E. 1968, MA: Addison-Wesley), § 2.2.6. [7] Kincaid, D.R., Respess, J.R., Young, D.M., and Grimes, R.G. 1982, ACM Transactions on Math- ematical Software , vol. 8, pp. 302–322. [8] PCGPAK User’s Guide (New Haven: Scientific Computing Associates, Inc.). [9] § 9. [10] Bentley, J. 1986, Programming Pearls (Reading, MA: Addison-Wesley), Golub, G.H., and Van Loan, C.F. 1989, , 2nd ed. (Baltimore: Johns Hopkins Matrix Computations §§ 10.2–10.3. [11] University Press), Chapters 4 and 10, particularly g of machine- Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), isit website Chapter 8. [12] ica). Baker, L. 1991, (New York: McGraw-Hill). [13] More C Tools for Scientists and Engineers Fletcher, R. 1976, in Numerical Analysis Dundee 1975 , Lecture Notes in Mathematics, vol. 506, A. Dold and B Eckmann, eds. (Berlin: Springer-Verlag), pp. 73–89. [14] Saad, Y., and Schulz, M. 1986, SIAM Journal on Scientific and Statistical Computing , vol. 7, pp. 856–869. [15] Bunch, J.R., and Rose, D.J. (eds.) 1976, Sparse Matrix Computations (New York: Academic Press). Duff, I.S., and Stewart, G.W. (eds.) 1979, Sparse Matrix Proceedings 1978 (Philadelphia: S.I.A.M.).

114 90 Solution of Linear Algebraic Equations Chapter 2. 2.8 Vandermonde Matrices and Toeplitz Matrices In § 2.4 the case of a tridiagonal matrix was treated specially, because that Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v operations, N particular type of linear system admits a solution in only of order 3 N rather than of order for the general linear problem. When such particular types exist, it is important to know about them. Your computational savings, should you ever happen to be working on a problem that involves the right kind of particular type, can be enormous. This section treats two special types of matrices that can be solved in of order 2 operations, not as good as tridiagonal, but a lot better than the general case. N (Other than the operations count, these two types having nothing in common.) Vandermonde matrices , occur in some problems Matrices of the first type, termed having to do with the fitting of polynomials, the reconstruction of distributions from their moments, and also other contexts. In this book, for example, a Vandermonde 3.5. Matrices of the second type, termed § , Toeplitz matrices problem crops up in tend to occur in problems involving deconvolution and signal processing. In this § book, a Toeplitz problem is encountered in 13.7. These are not the only special types of matrices worth knowing about. The = ,i,j / ( i + j − 1) =1 , whose components are of the form Hilbert matrices a ij ,...,N difficult 1 to can be inverted by an exact integer algorithm, and are very [1] invert in any other way, since they are notoriously ill-conditioned (see for details). The Sherman-Morrison and Woodbury formulas, discussed in § 2.7, can sometimes [2] gives some other be used to convert new special forms into old ones. Reference special forms. We have not found these additional forms to arise as frequently as the two that we now discuss. Vandermonde Matrices A Vandermonde matrix of size × N is completely determined by N arbitrary N 2 x numbers ,x ,...,x , in terms of which its N components are the integer powers 1 2 N − 1 j ,...,N . Evidently there are two possible such forms, depending on whether ,i,j =1 x i we view the i ’s as rows, j ’s as columns, or vice versa. In the former case, we get a linear system of equations that looks like this, g of machine- isit website       1 − N 2 x ··· x 1 x c y ica). 1 1 1 1 1       N 1 − 2       ··· x c x 1 y x 2 2 2 2 2       ( = ) · 2.8.1       . . . . . .       . . . . . . . . . . . . 1 − N 2 x ··· x c y 1 x N N N N N Performing the matrix multiplication, you will see that this equation solves for the unknown . ) ,y which fit a polynomial to the N pairs of abscissas and ordinates ( x c coefficients j i j Precisely this problem will arise in § 3.5, and the routine given there will solve (2.8.1) by the method that we are about to describe.

115 2.8 Vandermonde Matrices and Toeplitz Matrices 91 The alternative identification of rows and columns leads to the set of equations       q w 1 ··· 11 1 1       x x ··· q w x 2 2 2 1 N       2 2 2 · ) ( 2.8.2 = x w x ··· x q       3 3 1 2 N       ··· ··· ··· N − N N − 1 1 1 − w q x x ··· x N N 1 2 N Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Write this out and you will see that it relates to the : Given the values problem of moments of points x N , assigned so as to match the given values , find the unknown weights w i i [3] q moments. (For more on this problem, consult .) The routine given in of the first N j this section solves (2.8.2). The method of solution of both (2.8.1) and (2.8.2) is closely related to Lagrange’s polynomial interpolation formula, which we will not formally meet until § 3.1 below. Notwith- standing, the following derivation should be comprehensible: Let P defined by ( x ) be the polynomial of degree N − 1 j N N ∏ ∑ − x x n − 1 k x x )= A = ( 2.8.3 ) ( P j jk x − x n j =1 n k =1 ( )  j n = Here the meaning of the last equality is to define the components of the matrix A as the ij coefficients that arise when the product is multiplied out and like terms collected. P The polynomial x is a function of x generally. But you will notice that it is ( ) j specifically designed so that it takes on a value of zero at all x i j , and has a value with  = i x x of unity at = . In other words, j N ∑ 1 − k P ) δ )= = A 2.8.4 x x ( ( ij j i jk i =1 k 1 − k , which x is exactly the inverse of the matrix of components A But (2.8.4) says that jk i appears in (2.8.2), with the subscript as the column index. Therefore the solution of (2.8.2) is just that matrix inverse times the right-hand side, N ∑ ( q A = ) 2.8.5 w j jk k k =1 As for the transpose problem (2.8.1), we can use the fact that the inverse of the transpose is the transpose of the inverse, so N ∑ = A ) ( y 2.8.6 c j kj k k =1 The routine in § 3.5 implements this. It remains to find a good way of multiplying out the monomial terms in (2.8.3), in order to get the components of A . This is essentially a bookkeeping problem, and we will let you jk by ) x read the routine itself to see how it can be solved. One trick is to define a master P ( N g of machine- ∏ isit website ( x ) ≡ P ( x − x ) )( 2.8.7 n ica). =1 n ’s P work out its coefficients, and then obtain the numerators and denominators of the specific j § 5.3 for more on synthetic division.) via synthetic division by the one supernumerary term. (See 2 Since each such division is only a process of order N , the total procedure is of order N . You should be warned that Vandermonde systems are notoriously ill-conditioned, by their very nature. (As an aside anticipating § 5.8, the reason is the same as that which makes Chebyshev fitting so impressively accurate: there exist high-order polynomials that are very good uniform fits to zero. Hence roundoff error can introduce rather substantial coefficients of the leading terms of these polynomials.) It is a good idea always to compute Vandermonde problems in double precision.

116 92 Solution of Linear Algebraic Equations Chapter 2. The routine for (2.8.2) which follows is due to G.B. Rybicki. #include "nrutil.h" void vander(double x[], double w[], double q[], int n) ∑ 1 N k − Solves the Vandermonde linear system x ) ,...,N =1 k w ( = q . Input consists of i k i =1 i http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. the vectors x[1..n] q[1..n] ; the vector w[1..n] is output. and { int i,j,k; double b,s,t,xx; double *c; c=dvector(1,n); if (n == 1) w[1]=q[1]; else { Initialize array. for (i=1;i<=n;i++) c[i]=0.0; Coefficients of the master polynomial are found c[n] = -x[1]; for (i=2;i<=n;i++) { by recursion. xx = -x[i]; for (j=(n+1-i);j<=(n-1);j++) c[j] += xx*c[j+1]; c[n] += xx; } for (i=1;i<=n;i++) { Each subfactor in turn xx=x[i]; t=b=1.0; s=q[n]; for (k=n;k>=2;k--) { is synthetically divided, b=c[k]+xx*b; s += q[k-1]*b; matrix-multiplied by the right-hand side, t=xx*t+b; } and supplied with a denominator. w[i]=s/t; } } free_dvector(c,1,n); } Toeplitz Matrices An N × N Toeplitz matrix is specified by giving 2 N − 1 numbers R + ,k = − N k 1 − 1 , 0 ,..., 1 ,...,N − 1 . Those numbers are then emplaced as matrix elements constant , along the (upper-left to lower-right) diagonals of the matrix:   R ··· R R R R 1 0 − 2 − 1) 2) ( N − − − ( N − R R R R ··· R 1 0 − 1 N − ( N − 3) 2) − ( −     R R ··· R R R 1 2 0 − − 4) − ( ( N − 3) N   ) 2.8.8 (   ······ g of machine-   isit website R R R R ··· R 3 − 1 2 − N − 4 N − 0 N ica). R R ··· R R R N 0 − 2 − N − 3 1 N 1 The linear Toeplitz problem can thus be written as N ∑ ) R 2.8.9 )( ,...,N x =1 = y i ( i j − i j =1 j , are the unknowns to be solved for. ’s, j =1 ,...,N x where the j [4] R The Toeplitz matrix is symmetric if = R for all . Levinson developed an k k − k algorithm for fast solution of the symmetric Toeplitz problem, by a bordering method , that is,

117 2.8 Vandermonde Matrices and Toeplitz Matrices 93 a recursive procedure that solves the -dimensional Toeplitz problem M M ∑ ( M ) )( ,...,M =1 x i ) 2.8.10 R = y ( i j − i j =1 j M ( ) N , ,... until M = =1 , the desired result, is finally reached. The vector x M in turn for 2 j Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v M th stage, and becomes the desired answer only when is reached. is the result at the N [5] Levinson’s method is well documented in standard texts (e.g., ). The useful fact that the method generalizes to the nonsymmetric case seems to be less well known. At some risk of excessive detail, we therefore give a derivation here, due to G.B. Rybicki. M we find that our developing solution to step +1 In following a recursion from step M ( M ) x changes in this way: M ∑ ( M ) = R ) x y i =1 ,...,M ( 2.8.11 j i i − j j =1 becomes M ∑ M ( +1) +1) M ( R x ,...,M + R =1 i y = ( x 2.8.12 ) +1 j − i i +1) M ( − i j +1 M j =1 we find y By eliminating i ) ( M ( M +1) ( ) M ∑ x − x j j R ( ,...,M =1 i ) 2.8.13 = R j i − − ( M +1) i M ( +1) x =1 j M +1 i → M +1 − i and j → M +1 − j , or by letting M ∑ ) ( M R G 2.8.14 ( = R ) j i − i − j j =1 where M +1) ( ) M ( x − x +1 j M − M +1 − j ( ) M ≡ 2.8.15 ) ( G j +1) ( M x +1 M To put this another way, ( M +1) ) M M ) M ( +1) ( ( 2.8.16 ( ,...,M =1 j G ) − x = x x j +1 j j − +1 M M M − +1 M ( ) ( M ) x quantities M Thus, if we can use recursion to find the order and the single G and M +1) M ( +1) ( x order quantity +1 M will follow. Fortunately, the , then all of the other x j +1 M ( M +1) x quantity = , follows from equation (2.8.12) with i +1 M M +1 M ∑ ( M +1) ( M +1) 2.8.17 R y + R x ) x ( = M 0 +1 M +1 − j j +1 M g of machine- =1 j isit website ( M +1) ica). we can substitute the previous order +1 quantities For the unknown order M x j G since quantities in M M ( ) +1) ( − x x j j M ) ( ) 2.8.18 = ( G j +1 M − ( M +1) x M +1 The result of this operation is ∑ M ( ) M x y R − M M +1 +1 j − j =1 j M ( +1) 2.8.19 ) = ( x ∑ M +1 ) ( M M R G R − − j 0 +1 M j − +1 M =1 j

118 94 Solution of Linear Algebraic Equations Chapter 2. The only remaining problem is to develop a recursion relation for G . Before we do that, however, we should point out that there are actually two distinct sets of solutions to the original linear problem for a nonsymmetric matrix, namely right-hand solutions (which we have been discussing) and left-hand solutions z . The formalism for the left-hand solutions i differs only in that we deal with the equations M ∑ ) M ( Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer i R z ) 2.8.20 ( ,...,M = y =1 i i j − j j =1 Then, the same sequence of operations on this set leads to M ∑ ) M ( ) H R 2.8.21 = R ( − j i i j =1 j where M ( +1) M ) ( − z z − M +1 j j − +1 M M ( ) ≡ 2.8.22 ( ) H j M +1) ( z +1 M (compare with 2.8.14 – 2.8.15). The reason for mentioning the left-hand solutions now is satisfy exactly the same equation as the x except for that, by equation (2.8.21), the H j j y the substitution R on the right-hand side. Therefore we can quickly deduce from → i i equation (2.8.19) that ∑ ( M ) M R R − H +1 +1 j M M − j j =1 M ( +1) ) = ( 2.8.23 H ∑ +1 M ) ( M M − G R R − +1 M 0 j j M +1 − =1 j . R → , except for the substitution z satisfies the same equation as By the same token, G y i − i This gives ∑ ( M ) M − R G R 1 j − M − 1 M − − j =1 j +1) M ( = ) 2.8.24 ( G ∑ +1 M ) M ( M R − R H 1 M j − − 0 j − +1 M j =1 The same “morphism” also turns equation (2.8.16), and its partner for z , into the final equations ( M +1) M ( ) ( M ) +1) M ( G − H = G G j j +1 M M +1 − j ) 2.8.25 ( ( M ) ) +1) ( M +1) ( ( M M = G H H − H j j +1 − j M +1 M Now, starting with the initial values (1) (1) (1) x ) /R = y R /R = G 2.8.26 ( = R H /R − 1 0 0 1 1 0 1 1 1 we can recurse away. At each stage M we use equations (2.8.23) and (2.8.24) to find +1) M +1) ( M ( +1) M ( M +1) ( . H ,G ,G , and then equation (2.8.25) to find the other components of H M +1 +1 M +1) +1) M ( M ( and/or z are easily calculated. From there the vectors x The program below does this. It incorporates the second equation in (2.8.25) in the form g of machine- ) M +1) ) ( M ( ( M +1) ( M isit website H 2.8.27 ( G = ) H − H j M M +1 +1 − j +1 − j M ica). so that the computation can be done “in place.” =0 . In fact, because the bordering method R Notice that the above algorithm fails if 0 does not allow pivoting, the algorithm will fail if any of the diagonal principal minors of the original Toeplitz matrix vanish. (Compare with discussion of the tridiagonal algorithm in 2.4.) If the algorithm fails, your matrix is not necessarily singular — you might just have § to solve your problem by a slower and more general algorithm such as LU decomposition with pivoting. The routine that implements equations (2.8.23)–(2.8.27) is also due to Rybicki. Note that the routine’s is equal to R r[n+j] array vary from above, so that subscripts on the r j 1 to 2 N − 1 .

119 2.8 Vandermonde Matrices and Toeplitz Matrices 95 #include "nrutil.h" #define FREERETURN {free_vector(h,1,n);free_vector(g,1,n);return;} void toeplz(float r[], float x[], float y[], int n) ∑ N Solves the Toeplitz system y . The Toeplitz matrix need ) ( ,...,N =1 i = x R j i ( i − j ) + N =1 j not be symmetric. x[1..n] are input arrays; is the output array. and y[1..n] r[1..2*n-1] { Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) int j,k,m,m1,m2; float pp,pt1,pt2,qq,qt1,qt2,sd,sgd,sgn,shn,sxn; float *g,*h; if (r[n] == 0.0) nrerror("toeplz-1 singular principal minor"); g=vector(1,n); h=vector(1,n); x[1]=y[1]/r[n]; Initialize for the recursion. if (n == 1) FREERETURN g[1]=r[n-1]/r[n]; h[1]=r[n+1]/r[n]; for (m=1;m<=n;m++) { Main loop over the recursion. m1=m+1; Compute numerator and denominator for , sxn = -y[m1]; x sd = -r[n]; for (j=1;j<=m;j++) { sxn += r[n+m1-j]*x[j]; sd += r[n+m1-j]*g[m-j+1]; } if (sd == 0.0) nrerror("toeplz-2 singular principal minor"); whence x . x[m1]=sxn/sd; for (j=1;j<=m;j++) x[j] -= x[m1]*g[m-j+1]; if (m1 == n) FREERETURN sgn = -r[n-m1]; Compute numerator and denominator for G and H , shn = -r[n+m1]; sgd = -r[n]; for (j=1;j<=m;j++) { sgn += r[n+j-m1]*g[j]; shn += r[n+m1-j]*h[j]; sgd += r[n+j-m1]*h[m-j+1]; } if (sgd == 0.0) nrerror("toeplz-3 singular principal minor"); . g[m1]=sgn/sgd; whence G and H h[m1]=shn/sd; k=m; m2=(m+1) >> 1; pp=g[m1]; qq=h[m1]; for (j=1;j<=m2;j++) { pt1=g[j]; pt2=g[k]; qt1=h[j]; qt2=h[k]; g[j]=pt1-pp*qt2; g of machine- g[k]=pt2-pp*qt1; isit website h[j]=qt1-qq*pt2; ica). h[k--]=qt2-qq*pt1; } } Back for another recurrence. nrerror("toeplz - should not arrive here!"); } If you are in the business of solving very large Toeplitz systems, you should find out about 2 N (log N ) so-called “new, fast” algorithms, which require only on the order of operations, 2 compared to N for Levinson’s method. These methods are too complicated to include here.

120 96 Solution of Linear Algebraic Equations Chapter 2. [6] [7] Papers by Bunch and de Hoog will give entry to the literature. CITED REFERENCES AND FURTHER READING: Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations , 2nd ed. (Baltimore: Johns Hopkins University Press), Chapter 5 [also treats some other special forms]. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Computer Solution of Linear Algebraic Systems (Engle- Forsythe, G.E., and Moler, C.B. 1967, § 19. [1] wood Cliffs, NJ: Prentice-Hall), A Handbook of Numerical Matrix Inversion and Solution of Linear Equations Westlake, J.R. 1968, (New York: Wiley). [2] von Mises, R. 1964, Mathematical Theory of Probability and Statistics (New York: Academic Press), pp. 394ff. [3] Levinson, N., Appendix B of N. Wiener, 1949, Extrapolation, Interpolation and Smoothing of Stationary Time Series (New York: Wiley). [4] Robinson, E.A., and Treitel, S. 1980, Geophysical Signal Analysis (Englewood Cliffs, NJ: Prentice- Hall), pp. 163ff. [5] Bunch, J.R. 1985, SIAM Journal on Scientific and Statistical Computing , vol. 6, pp. 349–364. [6] de Hoog, F. 1987, Linear Algebra and Its Applications , vol. 88/89, pp. 123–138. [7] 2.9 Cholesky Decomposition If a square matrix happens to be symmetric and positive definite, then it has a A Symmetric a means that special, more efficient, triangular decomposition. = a for ji ij i,j =1 ,...,N , while positive definite means that v · A · v > 0 for all vectors v ( 2.9.1 ) (In Chapter 11 we will see that positive definite has the equivalent interpretation that A has all positive eigenvalues.) While symmetric, positive definite matrices are rather special, they occur quite frequently in some applications, so their special factorization, called Cholesky decomposition , is good to know about. When you can use it, Cholesky decomposition is about a factor of two faster than alternative methods for solving linear equations. and U , Cholesky Instead of seeking arbitrary lower and upper triangular factors L T decomposition constructs a lower triangular matrix L L whose transpose can itself serve as the upper triangular part. In other words we replace equation (2.3.1) by T · L L = A ( 2.9.2 ) This factorization is sometimes referred to as “taking the square root” of the matrix A . The T components of L L by are of course related to those of T g of machine- L ) = L 2.9.3 ( ji ij isit website ica). Writing out equation (2.9.2) in components, one readily obtains the analogs of equations (2.3.12)–(2.3.13), ( ) 2 / 1 1 − i ∑ 2 L = − ) a 2.9.4 ( L ii ii ik k =1 and ) ( − 1 i ∑ 1 L − L L = 2.9.5 +2 ,i +1 ( i ,...,N = ) j a ij ji jk ik L ii =1 k

121 2.9 Cholesky Decomposition 97 , you will see ,...,N 2 If you apply equations (2.9.4) and (2.9.5) in the order i =1 , L that the ’s that occur on the right-hand side are already determined by the time they are needed. Also, only components a j i A are referenced. (Since ≥ with is symmetric, ij overwrite the these have complete information.) It is convenient, then, to have the factor L , preserving the input subdiagonal (lower triangular but not including the diagonal) part of A upper triangular values of N is needed to store the diagonal . Only one extra vector of length A 3 N part of L . The operations count is / 6 executions of the inner loop (consisting of one Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v multiply and one subtract), with also N square roots. As already mentioned, this is about a factor 2 better than LU decomposition of A (where its symmetry would be ignored). A straightforward implementation is #include void choldc(float **a, int n, float p[]) Given a positive-definite symmetric matrix a[1..n][1..n] , this routine constructs its Cholesky T decomposition, L · L = A a need be given; it is not . On input, only the upper triangle of a , except for its diagonal is returned in the lower triangle of modified. The Cholesky factor L elements which are returned in p[1..n] . { void nrerror(char error_text[]); int i,j,k; float sum; for (i=1;i<=n;i++) { for (j=i;j<=n;j++) { for (sum=a[i][j],k=i-1;k>=1;k--) sum -= a[i][k]*a[j][k]; if (i == j) { , with rounding errors, is not positive definite. if (sum <= 0.0) a nrerror("choldc failed"); p[i]=sqrt(sum); } else a[j][i]=sum/p[i]; } } } You might at this point wonder about pivoting. The pleasant answer is that Cholesky decomposition is extremely stable numerically, without any pivoting at all. Failure of choldc A (or, with roundoff error, another very nearby matrix) is simply indicates that the matrix not positive definite. In fact, is an efficient way to test whether a symmetric matrix choldc is positive definite. (In this application, you will want to replace the call to nrerror with some less drastic signaling method.) Once your matrix is decomposed, the triangular factor can be used to solve a linear equation by backsubstitution. The straightforward implementation of this is void cholsl(float **a, int n, float p[], float b[], float x[]) Solves the set of = n linear equations A · x b ,where a is a positive-definite symmetric matrix. a[1..n][1..n] and p[1..n] are input as the output of the routine choldc . Only the lower subdiagonal portion of b[1..n] is input as the right-hand side vector. The a is accessed. g of machine- solution vector is returned in p x[1..n] . a , n ,and are not modified and can be left in place isit website for successive calls with different right-hand sides b . b is not modified unless you identify b and ica). x in the calling sequence, which is allowed. { int i,k; float sum; for (i=1;i<=n;i++) { Solve L · y = b ,storing y in x . for (sum=b[i],k=i-1;k>=1;k--) sum -= a[i][k]*x[k]; x[i]=sum/p[i]; } T Solve L for (i=n;i>=1;i--) { · x = y . for (sum=x[i],k=i+1;k<=n;k++) sum -= a[k][i]*x[k];

122 98 Chapter 2. Solution of Linear Algebraic Equations x[i]=sum/p[i]; } } cholsl choldc and is in the inversion of covariance matrices describing A typical use of the fit of data to a model; see, e.g., § 15.6. In this, and many other applications, one often needs − 1 L . The lower triangle of this matrix can be efficiently found from the output of choldc : readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin for (i=1;i<=n;i++) { a[i][i]=1.0/p[i]; for (j=i+1;j<=n;j++) { sum=0.0; for (k=i;k

123 2.10 QR Decomposition 99 The standard algorithm for the QR decomposition involves successive Householder transformations (to be discussed later in § 11.2). We write a Householder matrix in the form 1 = 1 c − u ⊗ u /c where . An appropriate Householder matrix applied to a given matrix · u u 2 can zero all elements in a column of the matrix situated below a chosen element. Thus we Q arrange for the first Householder matrix to zero all elements in the first column of A below 1 Q the first element. Similarly zeroes all elements in the second column below the second 2 Q element, and so on up to . Thus n − 1 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ··· Q · A ( 2.10.5 ) = Q R n − 1 1 Since the Householder matrices are orthogonal, − 1 2.10.6 Q ··· Q ··· Q = ) ( ) Q Q =( n n − 1 1 1 − 1 In most applications we don’t need to form Q explicitly; we instead store it in the factored form (2.10.6). Pivoting is not usually necessary unless the matrix A is very close to singular. [1] . For square A general algorithm for rectangular matrices including pivoting is given in QR matrices, an implementation is the following: #include #include "nrutil.h" void qrdcmp(float **a, int n, float *c, float *d, int *sing) Constructs the QR decomposition of is re- . The upper triangular matrix R a[1..n][1..n] , except for the diagonal elements of R which are returned in a turned in the upper triangle of d[1..n] . The orthogonal matrix Q is represented as a product of n − 1 Householder matrices Q Q − ... Q ,...,j =1 i ,where 1 is zero for = 1 − u u ⊗ u th component of /c i .The j j j j j 1 − n 1 while the nonzero components are returned in returns as j,...,n a[i][j] for i = sing . 1 ) if singularity is encountered during the decomposition, but the decomposition is still true ( completed in this case; otherwise it returns false ( 0 ). { int i,j,k; float scale,sigma,sum,tau; *sing=0; for (k=1;k

124 100 Solution of Linear Algebraic Equations Chapter 2. void qrsolv(float **a, int n, float c[], float d[], float b[]) Solves the set of b n linear equations A · x = . a[1..n][1..n] , c[1..n] ,and d[1..n] are is input as the and are not modified. b[1..n] qrdcmp input as the output of the routine right-hand side vector, and is overwritten with the solution vector on output. { void rsolv(float **a, int n, float d[], float b[]); int i,j; float sum,tau; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) T Form for (j=1;j=1;i--) { for (sum=0.0,j=i+1;j<=n;j++) sum += a[i][j]*b[j]; b[i]=(b[i]-sum)/d[i]; } } [2] See for details on how to use QR decomposition for constructing orthogonal bases, and for solving least-squares problems. (We prefer to use SVD, § 2.6, for these purposes, because of its greater diagnostic capability in pathological cases.) Updating a QR decomposition Some numerical algorithms involve solving a succession of linear systems each of which 3 differs only slightly from its predecessor. Instead of doing O ( N ) operations each time 2 to solve the equations from scratch, one can often update a matrix factorization in N ( O ) LU operations and use the new factorization to solve the next set of linear equations. The QR turns out to be decomposition is complicated to update because of pivoting. However, quite simple for a very common kind of update, A → A + s ⊗ t ( 2.10.7 ) g of machine- (compare equation 2.7.1). In practice it is more convenient to work with the equivalent form isit website ′ ′ ′ ica). R → A A = Q · ⊗ · Q = Q · ( R + u R v )( 2.10.8 ) = Q One can go back and forth between equations (2.10.7) and (2.10.8) using the fact that is orthogonal, giving T = v and either s = Q · u or u = Q t · s 2.10.9 ) ( [2] has two phases. In the first we apply − 1 Jacobi rotations ( § 11.1) to N The algorithm reduce R + u ⊗ v to upper Hessenberg form. Another N − 1 Jacobi rotations transform this ′ ′ upper Hessenberg matrix to the new upper triangular matrix R . The matrix Q is simply the T Q with the 2( N − 1) Jacobi rotations. In applications we usually want Q product of , and the algorithm can easily be rearranged to work with this matrix instead of with Q .

125 2.10 QR Decomposition 101 #include #include "nrutil.h" void qrupdt(float **r, float **qt, int n, float u[], float v[]) Given the QR decomposition of some n n matrix, calculates the QR decomposition of the × qt[1..n][1..n] , , r[1..n][1..n] ) ( + u ⊗ v R . The quantities are dimensioned as · Q matrix T v[1..n] . Q qt is input and returned in u[1..n] ,and .Notethat Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer { void rotate(float **r, float **qt, int n, int i, float a, float b); int i,j,k; for (k=n;k>=1;k--) { Find largest k such that u[k]  =0 . if (u[k]) break; } if (k < 1) k=1; to upper Hessenberg. Tr a n s f or m R + u ⊗ v for (i=k-1;i>=1;i--) { rotate(r,qt,n,i,u[i],-u[i+1]); if (u[i] == 0.0) u[i]=fabs(u[i+1]); else if (fabs(u[i]) > fabs(u[i+1])) u[i]=fabs(u[i])*sqrt(1.0+SQR(u[i+1]/u[i])); else u[i]=fabs(u[i+1])*sqrt(1.0+SQR(u[i]/u[i+1])); } for (j=1;j<=n;j++) r[1][j] += u[1]*v[j]; Transform upper Hessenberg matrix to upper tri- for (i=1;i #include "nrutil.h" void rotate(float **r, float **qt, int n, int i, float a, float b) Given matrices qt[1..n][1..n] r[1..n][1..n] and , carry out a Jacobi rotation on rows √ 2 2 b of each matrix. a and are the parameters of the rotation: i cos θ = a/ and a i + b +1 , √ 2 2 a . + b θ = b/ sin { int j; float c,fact,s,w,y; if (a == 0.0) { Avoid unnecessary overflow or underflow. c=0.0; s=(b >= 0.0 ? 1.0 : -1.0); } else if (fabs(a) > fabs(b)) { fact=b/a; c=SIGN(1.0/sqrt(1.0+(fact*fact)),a); s=fact*c; } else { fact=a/b; s=SIGN(1.0/sqrt(1.0+(fact*fact)),b); c=fact*s; } g of machine- for (j=i;j<=n;j++) { Premultiply r by Jacobi rotation. isit website y=r[i][j]; ica). w=r[i+1][j]; r[i][j]=c*y-s*w; r[i+1][j]=s*y+c*w; } for (j=1;j<=n;j++) { Premultiply qt by Jacobi rotation. y=qt[i][j]; w=qt[i+1][j]; qt[i][j]=c*y-s*w; qt[i+1][j]=s*y+c*w; } }

126 102 Chapter 2. Solution of Linear Algebraic Equations 9.7. QR decomposition, and its updating, in § We will make use of CITED REFERENCES AND FURTHER READING: Handbook for Automatic Com- Wilkinson, J.H., and Reinsch, C. 1971, Linear Algebra , vol. II of Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer putation (New York: Springer-Verlag), Chapter I/8. [1] Matrix Computations Golub, G.H., and Van Loan, C.F. 1989, , 2nd ed. (Baltimore: Johns Hopkins 5.2, 5.3, 12.6. [2] §§ University Press), 3 2.11 Is Matrix Inversion an N Process? We close this chapter with a little entertainment, a bit of algorithmic prestidig- itation which probes more deeply into the subject of matrix inversion. We start with a seemingly simple question: How many individual multiplications does it take to perform the matrix mul- 2 2 matrices, tiplication of two × ( ( ) ) ) ( b c b c a a 12 12 11 11 12 11 = 2.11.1 ) ( · b b a c c a 22 21 22 22 21 21 Eight, right? Here they are written explicitly: b = a × + a × b c 11 21 12 11 11 b c b × = a a × + 12 11 12 12 22 ( 2.11.2 ) + a c × b = a × b 22 21 11 21 21 c b = a × × b a + 12 22 22 22 21 Do you think that one can write formulas for the c ’s that involve only seven multiplications? (Try it yourself, before reading on.) [1] . The formulas are: Such a set of formulas was, in fact, discovered by Strassen ) b + b ≡ ( a ( + a × ) Q 22 1 11 11 22 g of machine- isit website Q ) ≡ ( a b + a × 21 11 2 22 ica). ) ≡ a b Q ( b − × 12 11 22 3 ) b ≡ a + × ( − b Q 11 4 21 22 2.11.3 ( ) Q × + a ≡ ) ( b a 22 11 5 12 ( − a Q + a ) ) × ( b b ≡ + 11 21 6 11 12 Q b ( − a ) ≡ × ( a b + ) 21 22 22 7 12

127 3 2.11 Is Matrix Inversion an N Process? 103 in terms of which c + = Q Q + Q − Q 11 5 4 1 7 + Q c = Q 2 4 21 ) 2.11.4 ( + = Q c Q 12 3 5 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Q c + Q − = Q Q + 2 3 1 22 6 What’s the use of this? There is one fewer multiplication than in equation (2.11.2), but many more additions and subtractions. It is not clear that anything has been gained. But notice that in (2.11.3) the a ’s and b ’s are never commuted. b Therefore (2.11.3) and (2.11.4) are valid when the a ’s and ’s are themselves matrices. m for some N =2 The problem of multiplying two very large matrices (of order integer m ) can now be broken down recursively by partitioning the matrices into quarters, sixteenths, etc. And note the key point: The savings is not just a factor each hierarchical level of the recursion. In total it reduces “7/8”; it is that factor at log 7 3 2 . instead of N N the process of matrix multiplication to order What about all the extra additions in (2.11.3)–(2.11.4)? Don’t they outweigh the advantage of the fewer multiplications? For large N , it turns out that there are six times as many additions as multiplications implied by (2.11.3)–(2.11.4). But, if N is very large, this constant factor is no match for the change in the exponent 7 3 log 2 to N . N from With this “fast” matrix multiplication, Strassen also obtained a surprising result [1] . Suppose that the matrices for matrix inversion ( ) ( ) a c c a 12 11 12 11 and ) ( 2.11.5 a c c a 21 22 21 22 are inverses of each other. Then the c ’s can be obtained from the a ’s by the following operations (compare equations 2.7.22 and 2.7.25): R = Inverse ( a ) 1 11 R a × R = 2 21 1 R a = R × 1 3 12 R = a R × 3 4 21 R = R − a 22 5 4 g of machine- isit website R = Inverse ( R ) 6 5 ) 2.11.6 ( ica). c = × R R 6 12 3 R × = R c 2 6 21 R R × c = 21 7 3 R c − R = 11 7 1 = − R c 22 6

128 104 Chapter 2. Solution of Linear Algebraic Equations In (2.11.6) the “inverse” operator occurs just twice. It is to be interpreted as the c ’s are c ’s are scalars, but as matrix inversion if the a reciprocal if the a ’s and ’s and themselves submatrices. Imagine doing the inversion of a very large matrix, of order m , recursively by partitions in half. At each step, halving the order doubles =2 N the number of inverse operations. But this means that there are only N divisions in all! So divisions don’t dominate in the recursive use of (2.11.6). Equation (2.11.6) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 7 log 2 N is dominated, in fact, by its 6 multiplications. Since these can be done by an algorithm, so can the matrix inversion! This is fun, but let’s look at practicalities: If you estimate how large N has to be is substantial 7=2 . 807 before the difference between exponent 3 and exponent log 2 enough to outweigh the bookkeeping overhead, arising from the complicated nature decomposition is in no LU of the recursive Strassen algorithm, you will find that immediate danger of becoming obsolete. If, on the other hand, you like this kind of fun, then try these: (1) Can you ( ( a + ib ) and multiply the complex numbers c + id ) in only three real multiplications? 5.4.] (2) Can you evaluate a general fourth-degree polynomial in [Answer: see § with only three x for many different values of x multiplications per evaluation? § 5.3.] [Answer: see CITED REFERENCES AND FURTHER READING: Strassen, V. 1969, Numerische Mathematik , vol. 13, pp. 354–356. [1] Algorithms: Their Complexity and Efficiency Kronsj ̈ o, L. 1987, , 2nd ed. (New York: Wiley). Winograd, S. 1971, Linear Algebra and Its Applications , vol. 4, pp. 381–388. Pan, V. Ya. 1980, SIAM Journal on Computing , vol. 9, pp. 321–342. Pan, V. 1984, How to Multiply Matrices Faster , Lecture Notes in Computer Science, vol. 179 (New York: Springer-Verlag) SIAM Review , vol. 26, pp. 393–415. [More recent results that show that an exponent Pan, V. 1984, of 2.496 can be achieved — theoretically!] g of machine- isit website ica).

129 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Chapter 3. Interpolation and Extrapolation 3.0 Introduction f ( x ) We sometimes know the value of a function x ,...,x ,x at a set of points 2 1 N (say, with x that lets <...

130 106 Chapter 3. Interpolation and Extrapolation , very mildly singular at x = π which is well-behaved everywhere except at x = π , and otherwise takes on all positive and negative values. Any interpolation based on 16 . , will assuredly get a very wrong answer for 3 , 15 the values x =3 . 13 , 3 . 14 , 3 . the value x =3 . 1416 , even though a graph plotting those five points looks really quite smooth! (Try it on your calculator.) Because pathologies can lurk anywhere, it is highly desirable that an interpo- http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin lation and extrapolation routine should provide an estimate of its own error. Such an error estimate can never be foolproof, of course. We could have a function that, for reasons known only to its maker, takes off wildly and unexpectedly between two tabulated points. Interpolation always presumes some degree of smoothness for the function interpolated, but within this framework of presumption, deviations from smoothness can be detected. Conceptually, the interpolation process has two stages: (1) Fit an interpolating function to the data points provided. (2) Evaluate that interpolating function at the target point x . However, this two-stage method is generally not the best way to proceed in practice. Typically it is computationally less efficient, and more susceptible to ) roundoff error, than methods which construct a functional estimate f ( x directly from the N tabulated values every time one is desired. Most practical schemes start ) , then add a sequence of (hopefully) decreasing corrections, x at a nearby point f ( i f ( x as information from other ’s is incorporated. The procedure typically takes ) i 2 O ( N ) operations. If everything is well behaved, the last correction will be the smallest, and it can be used as an informal (though not rigorous) bound on the error. In the case of polynomial interpolation, it sometimes does happen that the coefficients of the interpolating polynomial are of interest, even though their use in evaluating the interpolating function should be frowned on. We deal with this 3.5. eventuality in § Local interpolation, using a finite number of “nearest-neighbor” points, gives f interpolated values ( x ) that do not, in general, have continuous first or higher , the x derivatives. That happens because, as x crosses the tabulated values i interpolation scheme switches which tabulated points are the “local” ones. (If such a switch is allowed to occur anywhere else , then there will be a discontinuity in the interpolated function itself at that point. Bad idea!) In situations where continuity of derivatives is a concern, one must use the “stiffer” interpolation provided by a so-called function. A spline is spline a polynomial between each pair of table points, but one whose coefficients are determined “slightly” nonlocally. The nonlocality is designed to guarantee global smoothness in the interpolated function up to some order of derivative. Cubic splines ( § 3.3) are the most popular. They produce an interpolated function that is continuous g of machine- isit website through the second derivative. Splines tend to be stabler than polynomials, with less ica). possibility of wild oscillation between the tabulated points. The number of points (minus one) used in an interpolation scheme is called the order of the interpolation. Increasing the order does not necessarily increase the accuracy, especially in polynomial interpolation. If the added points are distant from the point of interest x , the resulting higher-order polynomial, with its additional constrained points, tends to oscillate wildly between the tabulated values. This oscillation may have no relation at all to the behavior of the “true” function (see Figure 3.0.1). Of course, adding points close to the desired point usually does help,

131 3.0 Introduction 107 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer (a) (b) (a) A smooth function (solid line) is more accurately interpolated by a high-order Figure 3.0.1. polynomial (shown schematically as dotted line) than by a low-order polynomial (shown as a piecewise linear dashed line). (b) A function with sharp corners or rapidly changing higher derivatives is less “ accurately approximated by a high-order polynomial (dotted line), which is too stiff, ” than by a low-order polynomial (dashed lines). Even some smooth functions, such as exponentials or rational functions, can be badly approximated by high-order polynomials. ner mesh implies a larger table of values, not always available. fi but a Unless there is solid evidence that the interpolating function is close in form to , it is a good idea to be cautious about high-order interpolation. f the true function We enthusiastically endorse interpolations with 3 or 4 points, we are perhaps tolerant of 5 or 6; but we rarely go higher than that unless there is quite rigorous monitoring of estimated errors. When your table of values contains many more points than the desirable order of interpolation, you must begin each interpolation with a search for the right “ local ” g of machine- place in the table. While not strictly a part of the subject of interpolation, this task is isit website important enough (and often enough botched) that we devote § 3.4 to its discussion. ica). The routines given for interpolation are also routines for extrapolation. An important application, in Chapter 16, is their use in the integration of ordinary differential equations. There, considerable care is taken with the monitoring of errors. Otherwise, the dangers of extrapolation cannot be overemphasized: An interpolating function, which is perforce an extrapolating function, will typically go berserk when the argument x is outside the range of tabulated values by more than the typical spacing of tabulated points. Interpolation can be done in more than one dimension, e.g., for a function

132 108 Chapter 3. Interpolation and Extrapolation f ( x,y,z ) . Multidimensional interpolation is often accomplished by a sequence of one-dimensional interpolations. We discuss this in § 3.6. CITED REFERENCES AND FURTHER READING: Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by 25.2. § Dover Publications, New York), Stoer, J., and Bulirsch, R. 1980, (New York: Springer-Verlag), Introduction to Numerical Analysis Chapter 2. ; 1990, corrected edition (Washington: Mathe- Acton, F.S. 1970, Numerical Methods That Work matical Association of America), Chapter 3. Numerical Methods and Software Kahaner, D., Moler, C., and Nash, S. 1989, (Englewood Cliffs, NJ: Prentice Hall), Chapter 4. , 2nd ed. (Reading, MA: Addison- Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis Wesley), Chapter 5. A First Course in Numerical Analysis Ralston, A., and Rabinowitz, P. 1978, , 2nd ed. (New York: McGraw-Hill), Chapter 3. (New York: Wiley), Chapter 6. Isaacson, E., and Keller, H.B. 1966, Analysis of Numerical Methods 3.1 Polynomial Interpolation and Extrapolation Through any two points there is a unique line. Through any three points, a unique quadratic. Et cetera. The interpolating polynomial of degree N − 1 through ) x ( f = ,...,y = f ( x ) is given explicitly by ,y ) = f ( x y points N the 2 1 N 1 N 2 Lagrange’s classical formula, ( x − x ) ( ) x − ( )( x − x ... ) ... ) x − x x x − x )( ( − x x 1 N 2 3 N 3 y y )= x ( P + 1 2 ) ) x ( x x − x − )( x x − x ( ) ... ( x ... − x ( x x − x − )( ) 2 2 N 1 3 1 2 2 1 N 3 1 )( x x − x ) x − ( ... ) x ( x − 2 1 N − 1 y + + ··· N ( x ) − x x )( x − − x x ) ... ( 2 N 1 N − 1 N N ) 3.1.1 ( − There are N terms, each a polynomial of degree N 1 and each constructed to be zero at all of the x . y except one, at which it is constructed to be i i It is not terribly wrong to implement the Lagrange formula straightforwardly, g of machine- but it is not terribly right either. The resulting algorithm gives no error estimate, and isit website it is also somewhat awkward to program. A much better algorithm (for constructing ica). , closely related to the same, unique, interpolating polynomial) is Neville’s algorithm , the latter now considered obsolete. Aitken’s algorithm and sometimes confused with be the value at x of the unique polynomial of degree zero (i.e., Let P 1 ( a constant) passing through the point x = Likewise define ) ;so P . ,y y 1 1 1 1 P of the unique polynomial of ,P x ,...,P be the value at . Now let P 12 3 2 N ,..., ,P P . Likewise ,y ) ) and ( x ,y degree one passing through both x ( 2 23 1 34 1 2 P . Similarly, for higher-order polynomials, up to P , which is the value 123 ...N ( N − 1) N of the unique interpolating polynomial through all N points, i.e., the desired answer.

133 3.1 Polynomial Interpolation and Extrapolation 109 The various ’s form a “tableau” with “ancestors” on the left leading to a single P “descendant” at the extreme right. For example, with N =4 , : y P = x 1 1 1 P 12 x : y = P P 123 2 2 2 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v P P ) 3.1.2 ( 23 1234 x = P : P y 3 3 3 234 P 34 x P y : = 4 4 4 Neville’s algorithm is a recursive way of filling in the numbers in the tableau a column at a time, from left to right. It is based on the relationship between a “daughter” P and its two “parents,” x x ( − P ) x ) P − x +( i i m + ) m i ( i ( i +1) ... ( i + m − 1) ... +2) i +1)( i ( + P = ( i +1) ) m + ... i ( i − x x + m i i 3.1.3 ( ) ... This recurrence works because the two parents already agree at points x i +1 x . + − 1 i m An improvement on the recurrence (3.1.3) is to keep track of the small differences m =1 , 2 ,..., between parents and daughters, namely to define (for − N ), 1 ≡ P − P C m,i 1) i... ( i + m ) m i... − ( i + 3.1.4 ) ( ≡ D . P P − m,i m i +1) m ( i + ( ) + i ( i... ) ... Then one can easily derive from (3.1.3) the relations x − C )( D − ) x ( i m + +1 +1 m,i m,i = D ,i m +1 x − x i i + m +1 ( ) 3.1.5 )( − x C ) − D x ( +1 i m,i m,i C = ,i m +1 x − x i + m +1 i m , the C ’s and D ’s are the corrections that make the interpolation one At each level order higher. The final answer P any ’s C is equal to the sum of y plus a set of i ...N 1 D and/or ’s that form a path through the family tree to the rightmost daughter. Here is a routine for polynomial interpolation or extrapolation from N input points. Note that the input arrays are assumed to be unit-offset. If you have g of machine- isit website zero-offset arrays, remember to subtract 1 (see § 1.2): ica). #include #include "nrutil.h" void polint(float xa[], float ya[], int n, float x, float *y, float *dy) Given arrays and ya[1..n] , and given a value xa[1..n] , this routine returns a value y ,and x an error estimate dy .If P ( x ) is the polynomial of degree N − 1 such that P ( xa = )= ya ,i i i . ) n , then the returned value y = P ( x ,..., 1 { int i,m,ns=1; float den,dif,dift,ho,hp,w;

134 110 Chapter 3. Interpolation and Extrapolation float *c,*d; dif=fabs(x-xa[1]); c=vector(1,n); d=vector(1,n); ns for (i=1;i<=n;i++) { Here we find the index of the closest table entry, if ( (dift=fabs(x-xa[i])) < dif) { ns=i; Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer dif=dift; } ’s. c[i]=ya[i]; and initialize the tableau of c ’s and d d[i]=ya[i]; } This is the initial approximation to y . *y=ya[ns--]; For each column of the tableau, for (m=1;m

135 3.2 Rational Function Interpolation and Extrapolation 111 3.2 Rational Function Interpolation and Extrapolation Some functions are not well approximated by polynomials, but are well approximated by rational functions, that is quotients of polynomials. We de- Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) a rational function passing through the m +1 points R note by ... ( i + m ) i ( i +1) ( x ,y ) ... ( x . More explicitly, suppose ) ,y i i i m m i + + μ p x p + ··· + x p + x ) ( P μ μ 1 0 = R = ( ) 3.2.1 ( i ... +1) i + i ) m ( ν Q x ( x q ) + q q x + ··· + 1 0 ν ν ’s and μ ν +1 Since there are p + q ’s ( q unknown being arbitrary), we must have 0 m +1= μ + ν +1 ( 3.2.2 ) In specifying a rational function interpolating function, you must give the desired order of both the numerator and the denominator. Rational functions are sometimes superior to polynomials, roughly speaking, because of their ability to model functions with poles, that is, zeros of the denominator of equation (3.2.1). These poles might occur for real values of x , if the function to be interpolated itself has poles. More often, the function f ( x ) is finite for all finite real x , but has an analytic continuation with poles in the complex x -plane. Such poles can themselves ruin a polynomial approximation, even one restricted to , just as they can ruin the convergence of an infinite power series x real values of . If you draw a circle in the complex plane around your m tabulated points, x in then you should not expect polynomial interpolation to be good unless the nearest pole is rather far outside the circle. A rational function approximation, by contrast, will stay “good” as long as it has enough powers of x in its denominator to account for (cancel) any nearby poles. For the interpolation problem, a rational function is constructed so as to go through a chosen set of tabulated functional values. However, we should also mention in passing that rational function approximations can be used in analytic work. One sometimes constructs a rational function approximation by the criterion that the rational function of equation (3.2.1) itself have a power series expansion that agrees with the first m +1 terms of the power series expansion of the desired ( , and is discussed in function f 5.12. x ) . This is called Pad ́ e approximation § Bulirsch and Stoer found an algorithm of the Neville type which performs g of machine- isit website rational function extrapolation on tabulated data. A tableau like that of equation ica). (3.1.2) is constructed column by column, leading to a result and an error estimate. The Bulirsch-Stoer algorithm produces the so-called diagonal rational function, with the degrees of numerator and denominator equal (if m is even) or with the degree of the denominator larger by one (if m is odd, cf. equation 3.2.2 above). For the [1] . The algorithm is summarized by a recurrence derivation of the algorithm, refer to

136 112 Chapter 3. Interpolation and Extrapolation relation exactly analogous to equation (3.1.3) for polynomial approximation: R = R ( +1) ( i + m ) i i ... i +1) ... ( i + m ) ( − R R ) i +1) ... ( i + m 1) m ( i... ( i + − )( ) ( + R − R ( − m + i ( i... 1) ) i +1) ... ( i + m − x x i 1 1 − − Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin x R − − R x + i m + m ) i +1) ( i +1) ... ( i + m − 1) ... ( ( i ( ) 3.2.3 m This recurrence generates the rational functions through +1 points from the ones and (the term m R through in equation 3.2.3) m − 1 points. It is started m ( 1) i +1) ... ( i + − with ) = y 3.2.4 ( R i i and with = with ) 3.2.5 ( 1] = 0 − m ≡ R [ R ) ( ... + +1) i ( i m i Now, exactly as in equations (3.1.4) and (3.1.5) above, we can convert the recurrence (3.2.3) to one involving only the small differences ≡ R − R C m,i i... i... ( i + m ) − 1) m ( i + ( 3.2.6 ) R − R ≡ D m,i +1) i... ( i ( ... ( i + m ) i + m ) Note that these satisfy the relation C − 3.2.7 ( − D D ) C = ,i m,i +1 m m,i ,i +1 m +1 which is useful in proving the recurrences ) D − ( C C m,i +1 m,i +1 m,i ( ) D = m +1 ,i x x − i D C − m,i +1 m,i x − x m +1 i + ) ( ) 3.2.8 ( x x − i D ) D C ( − m,i m,i +1 m,i x x − + m i +1 ) ( = C ,i m +1 x x − i D C − m,i m,i +1 x x − +1 + i m This recurrence is implemented in the following function, whose use is analogous 3.1. polint in every way to Note again that unit-offset input arrays are in § assumed ( § 1.2). g of machine- isit website #include ica). #include "nrutil.h" #define TINY 1.0e-25 A small number. #define FREERETURN {free_vector(d,1,n);free_vector(c,1,n);return;} void ratint(float xa[], float ya[], int n, float x, float *y, float *dy) Given arrays and ya[1..n] , and given a value of xa[1..n] , this routine returns a value of x y and an accuracy estimate dy . The value returned is that of the diagonal rational function, . x , which passes through the n points ( xa n , ya ... ) , i =1 evaluated at i i { int m,i,ns=1; float w,t,hh,h,dd,*c,*d;

137 3.3 Cubic Spline Interpolation 113 c=vector(1,n); d=vector(1,n); hh=fabs(x-xa[1]); for (i=1;i<=n;i++) { h=fabs(x-xa[i]); if (h == 0.0) { *y=ya[i]; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v *dy=0.0; FREERETURN } else if (h < hh) { ns=i; hh=h; } c[i]=ya[i]; part is needed to prevent a rare zero-over-zero The d[i]=ya[i]+TINY; TINY condition. } *y=ya[ns--]; for (m=1;m

138 114 Interpolation and Extrapolation Chapter 3. where x − x x − x j +1 j ≡ A B ) ≡ 1 − A = ( 3.3.2 x − x x x − j j +1 j +1 j Equations (3.3.1) and (3.3.2) are a special case of the general Lagrange interpolation readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) formula (3.1.1). Since it is (piecewise) linear, equation (3.3.1) has zero second derivative in the interior of each interval, and an undefined, or infinite, second derivative at the . The goal of cubic spline interpolation is to get an interpolation formula abscissas x j that is smooth in the first derivative, and continuous in the second derivative, both within an interval and at its boundaries. ,we y Suppose, contrary to fact, that in addition to the tabulated values of i ′′ y also have tabulated values for the function’s second derivatives, , that is, a set ′′ . Then, within each interval, we can add to the right-hand side of y of numbers i equation (3.3.1) a cubic polynomial whose second derivative varies linearly from a ′′ ′′ on the right. Doing so, we will have the desired on the left to a value y value y j +1 j continuous second derivative. If we also construct the cubic polynomial to have at values x zero and x , then adding it in will not spoil the agreement with the +1 j j . and y x and at the endpoints x tabulated functional values y j j j +1 j +1 A little side calculation shows that there is only one way to arrange this construction, namely replacing (3.3.1) by ′′ ′′ y = Ay + Cy By + Dy + ) 3.3.3 ( j +1 j +1 j j A and B are defined in (3.3.2) and where 1 1 3 2 2 3 ( A ( B − A )( x − x − x ) ) 3.3.4 D ≡ x B ) ( )( − C ≡ j +1 j j +1 j 6 6 Notice that the dependence on the independent variable x in equations (3.3.3) and (3.3.4) is entirely through the linear x -dependence of A and B , and (through A and . B x -dependence of C and D ) the cubic ′′ is in fact the second derivative of the new We can readily check that y interpolating polynomial. We take derivatives of equation (3.3.3) with respect to x , dD/dx . using the definitions of A, B, C, D to compute dA/dx, dB/dx, dC/dx , and The result is 2 2 dy y − − 1 1 − y 3 3 B A j +1 j ′′ ′′ x ( x ( = − x − x − ) y 3.3.5 ) + y ) ( j +1 +1 j j j +1 j j 6 6 x − dx x j +1 j g of machine- isit website ica). for the first derivative, and 2 y d ′′ ′′ Ay ) 3.3.6 + By = ( +1 j j 2 dx for the second derivative. Since A =1 at x B , =0 at x is just the A , while +1 j j ′′ is just the tabulated second derivative, and other way around, (3.3.6) shows that y also that the second derivative will be continuous across (e.g.) the boundary between ) . ,x x ( and ,x ) x the two intervals ( j 1 − j j j +1

139 3.3 Cubic Spline Interpolation 115 ′′ The only problem now is that we supposed the y ’s to be known, when, actually, i first derivative, computed they are not. However, we have not yet required that the from equation (3.3.5), be continuous across the boundary between two intervals. The key idea of a cubic spline is to require this continuity and to use it to get equations ′′ . y for the second derivatives i The required equations are obtained by setting equation (3.3.5) evaluated for http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. but x = in the interval ( x x equal to the same equation evaluated for ) ,x = x x j j j − j 1 ( x in the interval ,x ) 1 ) . With some rearrangement, this gives (for j =2 ,...,N − +1 j j x x x y − − − − x y − y x y x j j +1 1 − j − 1 j − +1 j 1 j j +1 j j j ′′ ′′ ′′ y y y + = + − j +1 1 − j j x x 3 6 x x − − 6 j j +1 j j − 1 ) ( 3.3.7 ′′ y unknowns These are N − 2 linear equations in the N =1 ,i ,...,N . Therefore i there is a two-parameter family of possible solutions. For a unique solution, we need to specify two further conditions, typically taken x . The most common ways of doing this are either and x as boundary conditions at 1 N ′′ ′′ y set one or both of • and y equal to zero, giving the so-called natural 1 N , which has zero second derivative on one or both of its cubic spline boundaries, or ′′ ′′ to values calculated from equation (3.3.5) so as and y y • set either of 1 N to make the first derivative of the interpolating function have a specified value on either or both boundaries. One reason that cubic splines are especially practical is that the set of equations (3.3.7), along with the two additional boundary conditions, are not only linear, but ′′ ± . Therefore, j 1 is coupled only to its nearest neighbors at y tridiagonal also . Each j operations by the tridiagonal algorithm ( the equations can be solved in O ( N ) § 2.4). That algorithm is concise enough to build right into the spline calculational routine. This makes the routine not completely transparent as an implementation of (3.3.7), tridag 2.4). Arrays so we encourage you to study it carefully, comparing with § ( 1.2. are assumed to be unit-offset. If you have zero-offset arrays, see § #include "nrutil.h" void spline(float x[], float y[], int n, float yp1, float ypn, float y2[]) Given arrays f containing a tabulated function, i.e., y x[1..n] and y[1..n] ( x ,with ) = i i < for the first derivative of the interpolating <...< x ypn x x and , and given values yp1 1 2 N that contains n , respectively, this routine returns an array y2[1..n] function at points 1 and the second derivatives of the interpolating function at the tabulated points x .If yp1 and/or i g of machine- 30 ypn are equal to 1 × 10 or larger, the routine is signaled to set the corresponding boundary isit website condition for a natural spline, with zero second derivative on that boundary. ica). { int i,k; float p,qn,sig,un,*u; u=vector(1,n-1); if (yp1 > 0.99e30) The lower boundary condition is set either to be “nat- ural” y2[1]=u[1]=0.0; else { or else to have a specified first derivative. y2[1] = -0.5; u[1]=(3.0/(x[2]-x[1]))*((y[2]-y[1])/(x[2]-x[1])-yp1); }

140 116 Interpolation and Extrapolation Chapter 3. for (i=2;i<=n-1;i++) { This is the decomposition loop of the tridiagonal al- sig=(x[i]-x[i-1])/(x[i+1]-x[i-1]); and are used for tem- y2 gorithm. u porary storage of the decomposed p=sig*y2[i-1]+2.0; factors. y2[i]=(sig-1.0)/p; u[i]=(y[i+1]-y[i])/(x[i+1]-x[i]) - (y[i]-y[i-1])/(x[i]-x[i-1]); u[i]=(6.0*u[i]/(x[i+1]-x[i-1])-sig*u[i-1])/p; } if (ypn > 0.99e30) The upper boundary condition is set either to be http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) “natural” qn=un=0.0; else { or else to have a specified first derivative. qn=0.5; un=(3.0/(x[n]-x[n-1]))*(ypn-(y[n]-y[n-1])/(x[n]-x[n-1])); } y2[n]=(un-qn*u[n-1])/(qn*y2[n-1]+1.0); for (k=n-1;k>=1;k--) This is the backsubstitution loop of the tridiagonal y2[k]=y2[k]*y2[k+1]+u[k]; algorithm. free_vector(u,1,n-1); } spline is called only once to It is important to understand that the program y and . Once this has been done, x process an entire tabulated function in arrays i i values of the interpolated function for any value of are obtained by calls (as many x as desired) to a separate routine splint (for “ spl ine int erpolation”): void splint(float xa[], float ya[], float y2a[], int n, float x, float *y) Given the arrays ’s in order), xa xa[1..n] and ya[1..n] , which tabulate a function (with the i spline above, and given a value of , which is the output from y2a[1..n] and given the array , this routine returns a cubic-spline interpolated value y . x { void nrerror(char error_text[]); int klo,khi,k; float h,b,a; klo=1; We will find the right place in the table by means of bisection. This is optimal if sequential calls to this khi=n; routine are at random values of x . If sequential calls while (khi-klo > 1) { are in order, and closely spaced, one would do better k=(khi+klo) >> 1; and and test if if (xa[k] > x) khi=k; khi to store previous values of klo they remain appropriate on the next call. else klo=k; . khi } klo and x now bracket the input value of h=xa[khi]-xa[klo]; if (h == 0.0) nrerror("Bad xa input to routine splint"); The xa ’s must be dis- tinct. a=(xa[khi]-x)/h; b=(x-xa[klo])/h; Cubic spline polynomial is now evaluated. *y=a*ya[klo]+b*ya[khi]+((a*a*a-a)*y2a[klo]+(b*b*b-b)*y2a[khi])*(h*h)/6.0; } g of machine- isit website CITED REFERENCES AND FURTHER READING: ica). De Boor, C. 1978, (New York: Springer-Verlag). A Practical Guide to Splines Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical Computations (Englewood Cliffs, NJ: Prentice-Hall), §§ 4.4–4.5. Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), § 2.4. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: § 3.8. McGraw-Hill),

141 3.4 How to Search an Ordered Table 117 3.4 How to Search an Ordered Table Suppose that you have decided to use some particular interpolation scheme, such as fourth-order polynomial interpolation, to compute a function f ( x ) from a ’s and f ’s. Then you will need a fast way of finding your place x set of tabulated Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin i i in the table of x ’s, given some particular value x at which the function evaluation i is desired. This problem is not properly one of numerical analysis, but it occurs so often in practice that it would be negligent of us to ignore it. n , Formally, the problem is this: Given an array of abscissas xx[j] , j = 1, 2, ..., with the elements either monotonically increasing or monotonically decreasing, and x x , find an integer j such that given a number lies between xx[j] and xx[j+1] . xx[0] For this task, let us define fictitious array elements and xx[n+1] equal to plus or minus infinity (in whichever order is consistent with the monotonicity of the j will always be between 0 and n , inclusive; a value of 0 indicates table). Then “off-scale” at one end of the table, n indicates off-scale at the other end. In most cases, when all is said and done, it is hard to do better than bisection , n tries. We already did use which will find the right place in the table in about log 2 bisection in the spline evaluation routine of the preceding section, so you splint might glance back at that. Standing by itself, a bisection routine looks like this: void locate(float xx[], unsigned long n, float x, unsigned long *j) Givenanarray , and given a value xx[j] , returns a value j such that x is between xx[1..n] x and xx[j+1] . xx must be monotonic, either increasing or decreasing. j=0 or j=n is returned x is out of range. to indicate that { unsigned long ju,jm,jl; int ascnd; jl=0; Initialize lower ju=n+1; and upper limits. ascnd=(xx[n] >= xx[1]); If we are not yet done, while (ju-jl > 1) { jm=(ju+jl) >> 1; compute a midpoint, if (x >= xx[jm] == ascnd) jl=jm; and replace either the lower limit else ju=jm; or the upper limit, as appropriate. } Repeat until the test condition is satisfied. if (x == xx[1]) *j=1; Then set the output else if(x == xx[n]) *j=n-1; else *j=jl; and return. } g of machine- isit website A unit-offset array xx is assumed. To use locate with a zero-offset array, ica). . remember to subtract from the address of xx , and also from the returned value j 1 Search with Correlated Values Sometimes you will be in the situation of searching a large table many times, and with nearly identical abscissas on consecutive searches. For example, you may be generating a function that is used on the right-hand side of a differential equation: Most differential-equation integrators, as we shall see in Chapter 16, call

142 118 Chapter 3. Interpolation and Extrapolation Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 1 32 64 (a) 51 32 8 hunt phase 22 38 14 710 1 (b) bisection phase Figure 3.4.1. (a) The routine locate fi nds a table entry by bisection. Shown here is the sequence hunt of steps that converge to element 51 in a table of length 64. (b) The routine searches from a previous known position in the table by increasing steps, then converges by bisection. Shown here is a favorable example, converging to element 32 from element 7. A particularly un favorable example would be convergence to an element near 7, such as 9, which would require just three “ hops. ” for right-hand side evaluations at points that hop back and forth a bit, but whose trend moves slowly in the direction of the integration. ab initio , on each call. The In such cases it is wasteful to do a full bisection, fi following routine instead starts with a guessed position in the table. It “ hunts, ” rst either up or down, in increments of 1, then 2, then 4, etc., until the desired value is bracketed. Second, it then bisects in the bracketed interval. At worst, this routine is about a factor of 2 slower than locate above (if the hunt phase expands to include n locate , if the desired faster than log the whole table). At best, it can be a factor of 2 point is usually quite close to the input guess. Figure 3.4.1 compares the two routines. void hunt(float xx[], unsigned long n, float x, unsigned long *jlo) Givenanarray is between x xx[1..n] , and given a value x , returns a value jlo such that xx[jlo] and xx[jlo+1] . xx[1..n] must be monotonic, either increasing or decreasing. jlo=0 on input is taken as the jlo or jlo=n is returned to indicate that x is out of range. on output. jlo initial guess for { unsigned long jm,jhi,inc; int ascnd; ascnd=(xx[n] >= xx[1]); True if ascending order of table, false otherwise. Input guess not useful. Go immediately to bisec- if (*jlo <= 0 || *jlo > n) { tion. *jlo=0; jhi=n+1; g of machine- } else { isit website inc=1; Set the hunting increment. ica). if (x >= xx[*jlo] == ascnd) { Hunt up: if (*jlo == n) return; jhi=(*jlo)+1; while (x >= xx[jhi] == ascnd) { Not done hunting, *jlo=jhi; inc += inc; so double the increment jhi=(*jlo)+inc; if (jhi > n) { Done hunting, since off end of table. jhi=n+1; break; } Try again.

143 3.4 How to Search an Ordered Table 119 Done hunting, value bracketed. } } else { Hunt down: if (*jlo == 1) { *jlo=0; return; } jhi=(*jlo)--; Not done hunting, while (x < xx[*jlo] == ascnd) { Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin jhi=(*jlo); inc <<= 1; so double the increment Done hunting, since off end of table. if (inc >= jhi) { *jlo=0; break; } else *jlo=jhi-inc; } and try again. } Done hunting, value bracketed. } Hunt is done, so begin the final bisection phase: while (jhi-(*jlo) != 1) { jm=(jhi+(*jlo)) >> 1; if (x >= xx[jm] == ascnd) *jlo=jm; else jhi=jm; } if (x == xx[n]) *jlo=n-1; if (x == xx[1]) *jlo=1; } xx If your array locate , above. is zero-offset, read the comment following After the Hunt The problem: Routines locate and hunt return an index j such that your is the , where and desired value lies between table entries xx[j] xx[1..n] xx[j+1] -point interpolated value using a routine m full length of the table. But, to obtain an ( polint ( § 3.1) or ratint § like 3.2), you need to supply much shorter xx and yy arrays, of length m . How do you make the connection? The solution: Calculate IMIN(IMAX(j-(m-1)/2,1),n+1-m) k = give the minimum and maximum of two integer IMAX (The macros IMIN and arguments; see § 1.2 and Appendix B.) This expression produces the index of the leftmost member of an m -point set of points centered (insofar as possible) between g of machine- and j then lets you call the j+1 , but bounded by 1 at the left and n at the right. C isit website interpolation routine with array addresses offset by k , e.g., ica). polint(&xx[k-1],&yy[k-1],m, ) ... CITED REFERENCES AND FURTHER READING: Knuth, D.E. 1973, Sorting and Searching ,vol.3of The Art of Computer Programming (Reading, § 6.2.1. MA: Addison-Wesley),

144 120 Chapter 3. Interpolation and Extrapolation 3.5 Coefficients of the Interpolating Polynomial Occasionally you may wish to know not the value of the interpolating polynomial that passes through a (small!) number of points, but the coefficients of that poly- nomial. A valid use of the coefficients might be, for example, to compute Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. simultaneous interpolated values of the function and of several of its derivatives (see § 5.3), or to convolve a segment of the tabulated function with some other function, where the moments of that other function (i.e., its convolution with powers of x ) are known analytically. However, please be certain that the coefficients are what you need. Generally the coefficients of the interpolating polynomial can be determined much less accurately than its value at a desired abscissa. Therefore it is not a good idea to determine the coefficients only for use in calculating interpolating values. Values thus calculated will not pass exactly through the tabulated points, for example, while values computed by the routines in § 3.1– § 3.3 will pass exactly through such points. Also, you should not mistake the interpolating polynomial (and its coefficients) for its cousin, the best fit polynomial through a data set. Fitting is a smoothing process, since the number of fitted coefficients is typically much less than the number of data points. Therefore, fitted coefficients can be accurately and stably determined even in the presence of statistical errors in the tabulated values. (See 14.8.) Interpolation, where the number of coefficients and number of tabulated § points are equal, takes the tabulated values as perfect. If they in fact contain statistical errors, these can be magnified into oscillations of the interpolating polynomial in between the tabulated points. ) y ≡ . If the interpolating ( x y As before, we take the tabulated points to be i i polynomial is written as 2 N y = c c ) x + c x 3.5.1 + ··· + c x + ( N 1 0 2 then the c ’s are required to satisfy the linear equation i       2 N ··· x x x 1 y c 0 0 0 0 0       N 2 x x y c 1 x ···       1 1 1 1 1 ( = · 3.5.2 )       . . . . . .       . . . . . . . . . . . . N 2 x x ··· c y x 1 N N N N N g of machine- This is a Vandermonde matrix § 2.8. One could in principle solve , as described in isit website equation (3.5.2) by standard techniques for linear equations generally ( § 2.3); however ica). the special method that was derived in § 2.8 is more efficient by a large factor, of order , so it is much better. N Remember that Vandermonde systems can be quite ill-conditioned. In such a case, no numerical method is going to give a very accurate answer. Such cases do not, please note, imply any difficulty in finding interpolated values by the methods of § 3.1, but only difficulty in finding coefficients . Like the routine in § 2.8, the following is due to G.B. Rybicki. Note that the arrays are all assumed to be zero-offset.

145 3.5 Coefficients of the Interpolating Polynomial 121 #include "nrutil.h" void polcoe(float x[], float y[], int n, float cof[]) Given arrays ( ) x , this routine x[0..n] and y[0..n] containing a tabulated function y f = i i ∑ j x cof . , such that y = cof[0..n] returns an array of coefficients j j i i { int k,j,i; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) float phi,ff,b,*s; s=vector(0,n); for (i=0;i<=n;i++) s[i]=cof[i]=0.0; s[n] = -x[0]; Coefficients for (i=1;i<=n;i++) { s of the master polynomial x ) are P ( i for (j=n-i;j<=n-1;j++) found by recurrence. s[j] -= x[i]*s[j+1]; s[n] -= x[i]; } for (j=0;j<=n;j++) { phi=n+1; ∏ phi = for (k=n;k>=1;k--) The quantity ( x − x ) is found as a j k j  = k phi=k*s[k]+x[j]*phi; ) . x derivative of P ( j ff=y[j]/phi; b=1.0; Coefficients of polynomials in each term of the La- grange formula are found by synthetic division of for (k=n;k>=0;k--) { x − P ( x ) by ( x ) .Thesolution is accumu- c cof[k] += b*ff; j k b=s[k]+x[j]*b; lated. } } free_vector(s,0,n); } Another Method Another technique is to make use of the function value interpolation routine polint § 3.1). If we interpolate (or extrapolate) to find the value of already given ( .Now x =0 the interpolating polynomial at c , then this value will evidently be 0 we can subtract c . Throwing from the y x ’s and divide each by its corresponding i 0 i out one point (the one with smallest x is a good candidate), we can repeat the i , and so on. procedure to find c 1 It is not instantly obvious that this procedure is stable, but we have generally stable than the routine immediately preceding. This more found it to be somewhat 2 3 . , while the preceding one was of order N You will method is of order N find, however, that neither works very well for large N , because of the intrinsic N ill-condition of the Vandermonde problem. In single precision, upto8or10is g of machine- isit website satisfactory; about double this in double precision. ica). #include #include "nrutil.h" void polcof(float xa[], float ya[], int n, float cof[]) Given arrays = containing a tabulated function ya xa[0..n] and f ( xa ya[0..n] ) ,this i i ∑ j cof[0..n] such that ya . = cof xa routine returns an array of coefficients j j i i { void polint(float xa[], float ya[], int n, float x, float *y, float *dy); int k,j,i; float xmin,dy,*x,*y;

146 122 Interpolation and Extrapolation Chapter 3. x=vector(0,n); y=vector(0,n); for (j=0;j<=n;j++) { x[j]=xa[j]; y[j]=ya[j]; } for (j=0;j<=n;j++) { http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin polint(x-1,y-1,n+1-j,0.0,&cof[j],&dy); .We Subtract 1 from the pointers to x and y because polint uses dimensions [1..n] extrapolate to =0 . x xmin=1.0e38; k = -1; for (i=0;i<=n-j;i++) { Find the remaining x of smallest i absolute value, if (fabs(x[i]) < xmin) { xmin=fabs(x[i]); k=i; } if (x[i]) y[i]=(y[i]-cof[j])/x[i]; (meanwhile reducing all the terms) } and eliminate it. for (i=k+1;i<=n-j;i++) { y[i-1]=y[i]; x[i-1]=x[i]; } } free_vector(y,0,n); free_vector(x,0,n); } If the point x =0 is not in (or at least close to) the range of the tabulated x ’s, i then the coefficients of the interpolating polynomial will in general become very large. However, the real “information content” of the coefficients is in small differences from the “translation-induced” large values. This is one cause of ill-conditioning, resulting in loss of significance and poorly determined coefficients. You should consider redefining the origin of the problem, to put in a sensible place. =0 x Another pathology is that, if too high a degree of interpolation is attempted on a smooth function, the interpolating polynomial will attempt to use its high-degree coefficients, in combinations with large and almost precisely canceling combinations, to match the tabulated values down to the last possible epsilon of accuracy. This effect is the same as the intrinsic tendency of the interpolating polynomial values to oscillate (wildly) between its constrained points, and would be present even if the machine’s floating precision were infinitely good. The above routines polcoe and polcof have slightly different sensitivities to the pathologies that can occur. is a good idea? coefficients Are you still quite certain that using the g of machine- isit website ica). CITED REFERENCES AND FURTHER READING: § 5.2. Isaacson, E., and Keller, H.B. 1966, Analysis of Numerical Methods (New York: Wiley),

147 3.6 Interpolation in Two or More Dimensions 123 3.6 Interpolation in Two or More Dimensions ) In multidimensional interpolation, we seek an estimate of y ( x ,x ,...,x 2 n 1 from an n -dimensional grid of tabulated values y and n one-dimensional vec- ,..., ,x tors giving the tabulated values of each of the independent variables x http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin 1 2 x . We will not here consider the problem of interpolating on a mesh that is not n Cartesian, i.e., has tabulated function values at “random” points in n -dimensional space rather than at the vertices of a rectangular array. For clarity, we will consider explicitly only the case of two dimensions, the cases of three or more dimensions being analogous in every way. In two dimensions, we imagine that we are given a matrix of functional values . We are also given an array x1a[1..m] , and an array ya[1..m][1..n] . x2a[1..n] is ,x ) x The relation of these input quantities to an underlying function y ( 2 1 )( = ( x1a[j] , ya[j][k] y 3.6.1 ) x2a[k] We want to estimate, by interpolation, the function y at some untabulated point . ) ,x ( x 1 2 grid square in which the point ( x An important concept is that of the ,x ) 1 2 falls, that is, the four tabulated points that surround the desired interior point. For convenience, we will number these points from 1 to 4, counterclockwise starting from the lower left (see Figure 3.6.1). More precisely, if x1a[j] ≤ x ≤ x1a[j+1] 1 3.6.2 ) ( x2a[k+1] ≤ x2a[k] ≤ x 2 j defines k , then and y ≡ ya[j][k] 1 y ya[j+1][k] ≡ 2 3.6.3 ( ) ya[j+1][k+1] ≡ y 3 y ≡ ya[j][k+1] 4 on the bilinear interpolation The simplest interpolation in two dimensions is grid square. Its formulas are: g of machine- isit website t ≡ ( x − x1a[j] ) / ( x1a[j+1] − x1a[j] ) 1 ica). 3.6.4 ( ) x u ≡ ( / ( − − x2a[k] ) ) x2a[k] x2a[k+1] 2 (so that and u each lie between 0 and 1), and t ) 3.6.5 ( uy ,x ) )=(1 − t )(1 − u ) y t + t (1 − u ) y − + tuy +(1 x ( y 4 1 2 3 1 2 Bilinear interpolation is frequently “close enough for government work.” As the interpolating point wanders from grid square to grid square, the interpolated

148 124 Interpolation and Extrapolation Chapter 3. pt. number pt. 3 pt. 4 1234 x2u = x 2 y desired pt. ( x , x ) 2 1 ⊗ y ∂ / ∂ x 1 readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. d2 user supplies y ∂ x ∂ / 2 these values pt. 1 pt. 2 2 y ∂ x ∂ x / ∂ 2 1 x x2l = 2 x1l x1u = = d1 1 1 x x (b) (a) Figure 3.6.1. (a) Labeling of points used in the two-dimensional interpolation routines bcuint and bcucof fi rst derivatives, . (b) For each of the four points in (a), the user supplies one function value, two and one cross-derivative, a total of 16 numbers. function value changes continuously. However, the gradient of the interpolated function changes discontinuously at the boundaries of each grid square. There are two distinctly different directions that one can take in going beyond bilinear interpolation to higher-order methods: One can use higher order to obtain increased accuracy for the interpolated function (for suf fi ciently smooth functions!), without necessarily trying to fi x up the continuity of the gradient and higher derivatives. Or, one can make use of higher order to enforce smoothness of some of these derivatives as the interpolating point crosses grid-square boundaries. We will now consider each of these two directions in turn. Higher Order for Accuracy The basic idea is to break up the problem into a succession of one-dimensional order interpolation in the x interpolations. If we want to do m-1 n-1 direction, and 1 sub-block of the tabulated function direction, we fi rst locate an m × n x order in the 2 matrix that contains our desired point ( x . We then do one-dimensional ,x m ) 2 1 interpolations in the x direction, i.e., on the rows of the sub-block, to get function 2 j . Finally, we do a last interpolation m ,..., =1 , ) ,x x1a[j] ( values at the points 2 x in the direction to get the answer. If we use the polynomial interpolation routine 1 3.1, and a sub-block which is presumed to be already located (and polint of § g of machine- isit website addressed through the pointer float **ya , see § 1.2), the procedure looks like this: ica). #include "nrutil.h" void polin2(float x1a[], float x2a[], float **ya, int m, int n, float x1, float x2, float *y, float *dy) Given arrays x1a[1..m] and x2a[1..n] of independent variables, and a submatrix of function values ya[1..m][1..n] , tabulated at the grid points defined by x1a and x2a ; and given values x1 x2 of the independent variables; this routine returns an interpolated function value y , and and an accuracy indication dy (based only on the interpolation in the x1 direction, however). { void polint(float xa[], float ya[], int n, float x, float *y, float *dy);

149 3.6 Interpolation in Two or More Dimensions 125 int j; float *ymtmp; ymtmp=vector(1,m); Loop over rows. for (j=1;j<=m;j++) { Interpolate answer into temporary stor- polint(x2a,ya[j],n,x2,&ymtmp[j],dy); age. } Do the final interpolation. polint(x1a,ymtmp,m,x1,y,dy); Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) free_vector(ymtmp,1,m); } Higher Order for Smoothness: Bicubic Interpolation We will give two methods that are in common use, and which are themselves fi rst is usually called bicubic interpolation . not unrelated. The Bicubic interpolation requires the user to specify at each grid point not just ∂y/∂x y ≡ ,x and ) , but also the gradients ∂y/∂x ≡ y , y x ( the function 1 1 2 2 , , 2 1 2 y/∂x ∂x ≡ y . Then an interpolating function that is the cross derivative ∂ , 2 12 1 cubic in the scaled coordinates t and u (equation 3.6.4) can be found, with the following properties: (i) The values of the function and the speci fi ed derivatives are reproduced exactly on the grid points, and (ii) the values of the function and the speci fi ed derivatives change continuously as the interpolating point crosses from one grid square to another. It is important to understand that nothing in the equations of bicubic interpolation requires you to specify the extra derivatives correctly ! The smoothness properties are accuracy tautologically “ forced, ” and have nothing to do with the “ of the speci ” ed fi derivatives. It is a separate problem for you to decide how to obtain the values that the interpolation will be. But fi ed. The better you do, the more accurate are speci it will be smooth no matter what you do. Best of all is to know the derivatives analytically, or to be able to compute them accurately by numerical means, at the grid points. Next best is to determine them by numerical differencing from the functional values already tabulated on the grid. The relevant code would be something like this (using centered differencing): y1a[j][k]=(ya[j+1][k]-ya[j-1][k])/(x1a[j+1]-x1a[j-1]); y2a[j][k]=(ya[j][k+1]-ya[j][k-1])/(x2a[k+1]-x2a[k-1]); y12a[j][k]=(ya[j+1][k+1]-ya[j+1][k-1]-ya[j-1][k+1]+ya[j-1][k-1]) /((x1a[j+1]-x1a[j-1])*(x2a[k+1]-x2a[k-1])); g of machine- isit website ica). To do a bicubic interpolation within a grid square, given the function and the y derivatives y1 , y2 , y12 at each of the four corners of the square, there are two steps: bcucof using the routine 4 ,i,j =1 ,..., First obtain the sixteen quantities c ij below. (The formulas that obtain the c ’ s from the function and derivative values are just a complicated linear transformation, with coef fi cients which, having been determined once in the mists of numerical history, can be tabulated and forgotten.) Next, substitute the c ’ s into any or all of the following bicubic formulas for function and derivatives, as desired:

150 126 Chapter 3. Interpolation and Extrapolation 4 4 ∑ ∑ 1 j 1 − i − t y ,x c u ( x )= 2 ij 1 =1 i j =1 4 4 ∑ ∑ j − 2 − i 1 ( x ) ,x dt/dx )= u y c 1) ( t − i ( ij 2 1 1 , 1 =1 i j =1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer ( ) 3.6.6 4 4 ∑ ∑ 2 j − i − 1 y ,x ( j )= 1) c ) t du/dx ( x u ( − ij 2 1 2 2 , =1 j i =1 4 4 ∑ ∑ 2 2 − j i − y ( ( x )( ,x dt/dx ( i − 1)( j − 1) c u t )= ) du/dx , ij 2 2 1 1 12 i =1 =1 j where t and u are again given by equation (3.6.4). void bcucof(float y[], float y1[], float y2[], float y12[], float d1, float d2, float **c) ,and , containing the function, gra- y[1..4] , y1[1..4] , y2[1..4] y12[1..4] Given arrays dients, and cross derivative at the four grid points of a rectangular grid cell (numbered coun- terclockwise from the lower left), and given d1 and d2 , the length of the grid cell in the 1- and bcuint c[1..4][1..4] that is used by routine 2-directions, this routine returns the table for bicubic interpolation. { static int wt[16][16]= { 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0, -3,0,0,3,0,0,0,0,-2,0,0,-1,0,0,0,0, 2,0,0,-2,0,0,0,0,1,0,0,1,0,0,0,0, 0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0, 0,0,0,0,-3,0,0,3,0,0,0,0,-2,0,0,-1, 0,0,0,0,2,0,0,-2,0,0,0,0,1,0,0,1, -3,3,0,0,-2,-1,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,-3,3,0,0,-2,-1,0,0, 9,-9,9,-9,6,3,-3,-6,6,-6,-3,3,4,2,1,2, -6,6,-6,6,-4,-2,2,4,-3,3,3,-3,-2,-1,-1,-2, 2,-2,0,0,1,1,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,2,-2,0,0,1,1,0,0, -6,6,-6,6,-3,-3,3,3,-4,4,2,-2,-2,-2,-1,-1, 4,-4,4,-4,2,2,-2,-2,2,-2,-2,2,1,1,1,1}; int l,k,j,i; float xx,d1d2,cl[16],x[16]; d1d2=d1*d2; for (i=1;i<=4;i++) { Pack a temporary vector x . x[i-1]=y[i]; x[i+3]=y1[i]*d1; x[i+7]=y2[i]*d2; g of machine- isit website x[i+11]=y12[i]*d1d2; } ica). for (i=0;i<=15;i++) { Matrix multiply by the stored table. xx=0.0; for (k=0;k<=15;k++) xx += wt[i][k]*x[k]; cl[i]=xx; } l=0; for (i=1;i<=4;i++) Unpack the result into the output table. for (j=1;j<=4;j++) c[i][j]=cl[l++]; }

151 3.6 Interpolation in Two or More Dimensions 127 The implementation of equation (3.6.6), which performs a bicubic interpolation, gives back the interpolated function value and the two gradient values, and uses the above routine , is simply: bcucof #include "nrutil.h" http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v void bcuint(float y[], float y1[], float y2[], float y12[], float x1l, float x1u, float x2l, float x2u, float x1, float x2, float *ansy, float *ansy1, float *ansy2) Bicubic interpolation within a grid square. Input quantities are y,y1,y2,y12 (as described in x1l x1u bcucof ); , the lower and upper coordinates of the grid square in the 1-direction; and and x2u likewise for the 2-direction; and x1,x2 , the coordinates of the desired point for x2l the interpolation. The interpolated function value is returned as , and the interpolated ansy gradient values as ansy1 and ansy2 . This routine calls bcucof . { void bcucof(float y[], float y1[], float y2[], float y12[], float d1, float d2, float **c); int i; float t,u,d1,d2,**c; c=matrix(1,4,1,4); d1=x1u-x1l; d2=x2u-x2l; Get the c ’s. bcucof(y,y1,y2,y12,d1,d2,c); if (x1u == x1l || x2u == x2l) nrerror("Bad input in routine bcuint"); t=(x1-x1l)/d1; Equation (3.6.4). u=(x2-x2l)/d2; *ansy=(*ansy2)=(*ansy1)=0.0; for (i=4;i>=1;i--) { Equation (3.6.6). *ansy=t*(*ansy)+((c[i][4]*u+c[i][3])*u+c[i][2])*u+c[i][1]; *ansy2=t*(*ansy2)+(3.0*c[i][4]*u+2.0*c[i][3])*u+c[i][2]; *ansy1=u*(*ansy1)+(3.0*c[4][i]*t+2.0*c[3][i])*t+c[2][i]; } *ansy1 /= d1; *ansy2 /= d2; free_matrix(c,1,4,1,4); } Higher Order for Smoothness: Bicubic Spline The other common technique for obtaining smoothness in two-dimensional . Actually, this is equivalent to a special case bicubic spline interpolation is the of bicubic interpolation: The interpolating function is of the same functional form as equation (3.6.6); the values of the derivatives at the grid points are, however, globally determined “ ” by one-dimensional splines. However, bicubic splines are g of machine- usually implemented in a form that looks rather different from the above bicubic isit website interpolation routines, instead looking much closer in form to the routine polin2 ica). above: To interpolate one functional value, one performs one-dimensional splines m across the rows of the table, followed by one additional one-dimensional spline down the newly created column. It is a matter of taste (and trade-off between time and memory) as to how much of this process one wants to precompute and store. Instead of precomputing and storing all the derivative information (as in bicubic interpolation), spline users typically precompute and store only one auxiliary table, of second derivatives in one direction only. Then one need only do spline evaluations (not constructions) for the m row splines; one must still do a construction and an

152 128 Chapter 3. Interpolation and Extrapolation fi nal column spline. (Recall that a spline construction is a process evaluation for the log of order N , while a spline evaluation is only of order N — and that is just to nd the place in the table!) fi Here is a routine to precompute the auxiliary second-derivative table: void splie2(float x1a[], float x2a[], float **ya, int m, int n, float **y2a) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin m n tabulated function ya[1..m][1..n] by , and tabulated independent variables Given an x2a[1..n] , this routine constructs one-dimensional natural cubic splines of the rows of ya and returns the second-derivatives in the array . (The array x1a[1..m] is y2a[1..m][1..n] included in the argument list merely for consistency with routine .) splin2 { void spline(float x[], float y[], int n, float yp1, float ypn, float y2[]); int j; for (j=1;j<=m;j++) 30 spline(x2a,ya[j],n,1.0e30,1.0e30,y2a[j]); Values × 10 1 signal a nat- } ural spline. § 1.2.) (If you want to interpolate on a sub-block of a bigger matrix, see After the above routine has been executed once, any number of bicubic spline interpolations can be performed by successive calls of the following routine: #include "nrutil.h" void splin2(float x1a[], float x2a[], float **ya, float **y2a, int m, int n, float x1, float x2, float *y) Given and x1a y2a as produced by that routine; and , x2a , ya , m , n as described in splie2 , x2 ; this routine returns an interpolated function value y x1 given a desired interpolating point by bicubic spline interpolation. { void spline(float x[], float y[], int n, float yp1, float ypn, float y2[]); void splint(float xa[], float ya[], float y2a[], int n, float x, float *y); int j; float *ytmp,*yytmp; ytmp=vector(1,m); Perform yytmp=vector(1,m); m evaluations of the row splines constructed by splie2 , using the one-dimensional spline evaluator for (j=1;j<=m;j++) splint . splint(x2a,ya[j],y2a[j],n,x2,&yytmp[j]); Construct the one-dimensional col- spline(x1a,yytmp,m,1.0e30,1.0e30,ytmp); umn spline and evaluate it. splint(x1a,yytmp,ytmp,m,x1,y); free_vector(yytmp,1,m); free_vector(ytmp,1,m); } g of machine- isit website CITED REFERENCES AND FURTHER READING: ica). Abramowitz, M., and Stegun, I.A. 1964, , Applied Mathe- Handbook of Mathematical Functions matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), § 25.2. Kinahan, B.F., and Harm, R. 1975, Astrophysical Journal , vol. 200, pp. 330–335. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis , 2nd ed. (Reading, MA: Addison- § 5.2.7. Wesley), Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), § 7.7.

153 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Integration of Functions Chapter 4. 4.0 Introduction Numerical integration, which is also called quadrature , has a history extending back to the invention of calculus and before. The fact that integrals of elementary functions could not, in general, be computed analytically, while derivatives could be, served to give the field a certain panache, and to set it a cut above the arithmetic drudgery of numerical analysis during the whole of the 18th and 19th centuries. With the invention of automatic computing, quadrature became just one numer- ical task among many, and not a very interesting one at that. Automatic computing, even the most primitive sort involving desk calculators and rooms full of “computers” (that were, until the 1950s, people rather than machines), opened to feasibility the much richer field of numerical integration of differential equations. Quadrature is merely the simplest special case: The evaluation of the integral ∫ b ) dx ) ( 4.0.1 x f ( = I a is precisely equivalent to solving for the value I ≡ y ( b ) the differential equation dy ) 4.0.2 )( x = f ( dx with the boundary condition ) y ( a )=0( 4.0.3 Chapter 16 of this book deals with the numerical integration of differential equations. In that chapter, much emphasis is given to the concept of “variable” or g of machine- “adaptive” choices of stepsize. We will not, therefore, develop that material here. isit website If the function that you propose to integrate is sharply concentrated in one or more ica). peaks, or if its shape is not readily characterized by a single length-scale, then it is likely that you should cast the problem in the form of (4.0.2)–(4.0.3) and use the methods of Chapter 16. The quadrature methods in this chapter are based, in one way or another, on the obvious device of adding up the value of the integrand at a sequence of abscissas within the range of integration. The game is to obtain the integral as accurately as possible with the smallest number of function evaluations of the integrand. Just as in the case of interpolation (Chapter 3), one has the freedom to choose methods 129

154 130 Integration of Functions Chapter 4. , with higher order sometimes, but not always, giving higher of various orders accuracy. “Romberg integration,” which is discussed in § 4.3, is a general formalism for making use of integration methods of a variety of different orders, and we recommend it highly. Apart from the methods of this chapter and of Chapter 16, there are yet other methods for obtaining integrals. One important class is based on function Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin approximation. We discuss explicitly the integration of functions by Chebyshev approximation (“Clenshaw-Curtis” quadrature) in § 5.9. Although not explicitly discussed here, you ought to be able to figure out how to do cubic spline quadrature using the output of the routine spline in § 3.3. (Hint: Integrate equation 3.3.3 [1] .) x over analytically. See Some integrals related to Fourier transforms can be calculated using the fast Fourier transform (FFT) algorithm. This is discussed in § 13.9. Multidimensional integrals are another whole multidimensional bag of worms. Section 4.6 is an introductory discussion in this chapter; the important technique of Monte-Carlo integration is treated in Chapter 7. CITED REFERENCES AND FURTHER READING: Carnahan, B., Luther, H.A., and Wilkes, J.O. 1969, (New York: Applied Numerical Methods Wiley), Chapter 2. Analysis of Numerical Methods (New York: Wiley), Chapter 7. Isaacson, E., and Keller, H.B. 1966, Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), Chapter 4. Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), Chapter 3. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis , 2nd ed. (New York: McGraw-Hill), Chapter 4. Numerical Methods Dahlquist, G., and Bjorck, A. 1974, (Englewood Cliffs, NJ: Prentice-Hall), § 7.4. (Englewood Cliffs, Kahaner, D., Moler, C., and Nash, S. 1989, Numerical Methods and Software NJ: Prentice Hall), Chapter 5. Computer Methods for Mathematical Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, 5.2, p. 89. [1] § (Englewood Cliffs, NJ: Prentice-Hall), Computations Davis, P., and Rabinowitz, P. 1984, Methods of Numerical Integration , 2nd ed. (Orlando, FL: Academic Press). 4.1 Classical Formulas for Equally Spaced g of machine- isit website Abscissas ica). Where would any book on numerical analysis be without Mr. Simpson and his “rule”? The classical formulas for integrating a function whose value is known at equally spaced steps have a certain elegance about them, and they are redolent with historical association. Through them, the modern numerical analyst communes with the spirits of his or her predecessors back across the centuries, as far as the time of Newton, if not farther. Alas, times do change; with the exception of two of the most modest formulas (“extended trapezoidal rule,” equation 4.1.11, and “extended

155 4.1 Classical Formulas for Equally Spaced Abscissas 131 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. h x x x x x 0 1 2 N + 1 N open formulas use these points closed formulas use these points Quadrature formulas with equally spaced abscissas compute the integral of a function Figure 4.1.1. and x . Closed formulas evaluate the function on the boundary points, while open x between 0 N +1 formulas refrain from doing so (useful if the evaluation algorithm breaks down on the boundary points). midpoint rule, ” equation 4.1.19, see § 4.2), the classical formulas are almost entirely useless. They are museum pieces, but beautiful ones. , ,...,x ,x Some notation: We have a sequence of abscissas, denoted x 0 1 N x which are spaced apart by a constant step h , +1 N ) 4.1.1 = x ( + ih i =0 , 1 ,...,N +1 x 0 i A function f ( x ) has known values at the x s, ’ i ) 4.1.2 ( f ) ≡ f x ( i i a and an upper limit between a lower limit We want to integrate the function f ( x ) ’ s. An integration b a and b are each equal to one or the other of the x , where i formula that uses the value of the function at the endpoints, f ( a ) or f ( b ) , is called closed a formula. Occasionally, we want to integrate a function whose value at one f goes to a limit cult to compute (e.g., the computation of fi or both endpoints is dif of zero over zero there, or worse yet has an integrable singularity there). In this s strictly ’ case we want an x formula, which estimates the integral using only open i between a and b (see Figure 4.1.1). The basic building blocks of the classical formulas are rules for integrating a nd function over a small number of intervals. As that number increases, we can fi rules that are exact for polynomials of increasingly high order. (Keep in mind that higher order does not always imply higher accuracy in real cases.) A sequence of such closed formulas is now given. g of machine- isit website ica). Closed Newton-Cotes Formulas Trapezoidal rule: [ ] ∫ x 2 1 1 ′′ 3 f f x ( )( ) + f dx = 4.1.3 ) h f O ( h + 2 1 2 2 x 1 Here the error term O () signi fi es that the true answer differs from the estimate by 3 times the value an amount that is the product of some numerical coef fi cient times h

156 132 Integration of Functions Chapter 4. of the function ’ s second derivative somewhere in the interval of integration. The coef fi cient is knowable, and it can be found in all the standard references on this subject. The point at which the second derivative is to be evaluated is, however, unknowable. If we knew it, we could evaluate the function there and have a higher- order method! Since the product of a knowable and an unknowable is unknowable, O () fi cient. , instead of the coef we will streamline our formulas and write only Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. and x ). It is exact for polynomials Equation (4.1.3) is a two-point formula ( x 2 1 up to and including degree 1, i.e., f ( x )= x . One anticipates that there is a three-point formula exact up to polynomials of degree 2. This is true; moreover, by a cients due to left-right symmetry of the formula, the three-point cancellation of coef fi 3 : )= x x formula is exact for polynomials up to and including degree 3, i.e., ( f Simpson’s rule: [ ] ∫ x 3 1 4 1 5 (4) f f f h f ( x + ) ) dx = + )( 4.1.4 f + O ( h 3 2 1 3 3 3 x 1 (4) means the fourth derivative of the function f evaluated at an unknown Here f place in the interval. Note also that the formula gives the integral over an interval cients add up to 2. of size 2 h , so the coef fi There is no lucky cancellation in the four-point formula, so it is also exact for polynomials up to and including degree 3. 3 rule: Simpson’s 8 [ ] ∫ x 4 3 9 9 3 (4) 5 f f f f f dx )( + f ) h ) + = x 4.1.5 + ( + O ( h 3 2 4 1 8 8 8 8 x 1 The fi ve-point formula again bene fi ts from a cancellation: Bode’s rule: [ ] ∫ x 5 14 14 64 24 64 7 (6) f f f f f ) f f + ( x ) dx + = + h 4.1.6 + )( + O ( h 3 2 5 1 4 45 45 45 45 45 x 1 This is exact for polynomials up to and including degree 5. At this point the formulas stop being named after famous personages, so we [1] will not go any further. Consult for additional formulas in the sequence. g of machine- isit website ica). Extrapolative Formulas for a Single Interval We are going to depart from historical practice for a moment. Many texts would give, at this point, a sequence of Newton-Cotes Formulas of Open Type. ” “ Here is an example: [ ] ∫ x 5 55 5 55 5 5 (4) f f f f x ) + dx = h f + ) + ( f + h ( O 4 1 2 3 24 24 24 24 x 0

157 4.1 Classical Formulas for Equally Spaced Abscissas 133 is estimated, using only the interior Notice that the integral from a = x x to b = 0 5 points x . In our opinion, formulas of this type are not useful for the ,x ,x ,x 4 2 3 1 reasons that (i) they cannot usefully be strung together to get “ extended ” rules, as we are about to do with the closed formulas, and (ii) for all other possible uses they are 4.5. dominated by the Gaussian integration formulas which we will introduce in § Instead of the Newton-Cotes open formulas, let us set out the formulas for readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) to x , using values of the x estimating the integral in the single interval from 0 1 f at x function ,x “ extended ” . These will be useful building blocks for the ,... 2 1 open formulas. ∫ x 1 2 ′ x 4.1.7 dx = h [ f ) ]+ O ( h f f ( )( ) 1 x 0 [ ] ∫ x 1 3 1 ′′ 3 f f dx = h ) )( f ) f − ( x 4.1.8 + ( h O 1 2 2 2 x 0 [ ] ∫ x 1 23 16 5 (3) 4 f f f 4.1.9 )( f ( f − ) dx + = h ) x O ( h + 3 1 2 12 12 12 x 0 [ ] ∫ x 1 55 59 9 37 5 (4) f f f f = f f − ( x ) + dx ) 4.1.10 − h )( O ( h + 3 1 2 4 24 24 24 24 x 0 Perhaps a word here would be in order about how formulas like the above can be derived. There are elegant ways, but the most straightforward is to write down the fi basic form of the formula, replacing the numerical coef cients with unknowns, say =1 h =0 and x . Substitute in =1 ,so x . Without loss of generality take p, q, r, s 0 1 2 x )= x ( f , ,f x ,f )= ,f x ) the functions f , x ( , f ( )=1 f (and for ) x ( f turn for 2 1 4 3 3 f x )= x and ( . Doing the integral in each case reduces the left-hand side to a number, and the right-hand side to a linear equation for the unknowns . p, q, r, s fi cients. Solving the four equations produced in this way gives the coef Extended Formulas (Closed) N − 1 times, to do the integration in the intervals If we use equation (4.1.3) ( x extended “ ,..., , and then add the results, we obtain an ,x ) ) , ( x ,x ,x x ) ” ( 3 1 2 N 2 1 N − . x to “ ” formula for the integral from x or composite 1 N Extended trapezoidal rule: g of machine- isit website [ ∫ x N 1 ica). f x ) + = h + f f ( + f dx 1 2 3 2 x 1 ) 4.1.11 ( ) ( ] ′′ 3 a ) ( b − f 1 + f ··· f + O + N 1 − N 2 N 2 Here we have written the error estimate in terms of the interval − a and the number b of points N instead of in terms of h . This is clearer, since one is usually holding a and b fi xed and wanting to know (e.g.) how much the error will be decreased

158 134 Chapter 4. Integration of Functions by taking twice as many steps (in this case, it is by a factor of 4). In subsequent the scaling of the error term with the number of steps. equations we will show only 4.2, equation (4.1.11) is in fact For reasons that will not become clear until § the most important equation in this section, the basis for most practical quadrature schemes. 3 is: The extended formula of order 1 /N Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) [ ∫ x N 5 13 f f x ) dx = h + f f + + + f ( 3 4 1 2 12 12 x 1 ( ) 4.1.12 ) ] ( 5 13 1 f f f + ··· + + + O N N − 1 N − 2 3 N 12 12 (We will see in a moment where this comes from.) If we apply equation (4.1.4) to successive, nonoverlapping pairs of intervals, we get the extended Simpson’s rule: [ ∫ x N 1 4 4 2 f f f f + h + dx ) = x f + + ( 2 1 3 4 3 3 3 3 x 1 ( ) 4.1.13 ) ( ] 1 1 4 2 f f f + + + ··· O + N N 2 − − N 1 4 N 3 3 3 Notice that the 2/3, 4/3 alternation continues throughout the interior of the evalu- ation. Many people believe that the wobbling alternation somehow contains deep information about the integral of their function that is not apparent to mortal eyes. In fact, the alternation is an artifact of using the building block (4.1.4). Another s rule is extended formula with the same order as Simpson ’ [ ∫ x N 3 7 23 f f f + h f + f + f ( x + + ) dx = 3 2 1 4 5 6 8 24 x 1 ] 3 7 23 4.1.14 ) ( f f f f + ··· f + + + + N − 2 − 3 N 4 N − 1 N − N 8 6 24 ( ) 1 O + 4 N tting cubic polynomials through successive groups This equation is constructed by fi of four points; we defer details to § 18.3, where a similar technique is used in the solution of integral equations. We can, however, tell you where equation (4.1.12) g of machine- isit website came from. It is Simpson ed version of ’ s extended rule, averaged with a modi fi ica). fi rst and last step are done with the trapezoidal rule (4.1.3). The itself in which the trapezoidal step is two orders lower than Simpson ’ s rule; however, its contribution to the integral goes down as an additional power of N (since it is used only twice, not N times). This makes the resulting formula of degree one less than Simpson.

159 4.1 Classical Formulas for Equally Spaced Abscissas 135 Extended Formulas (Open and Semi-open) We can construct open and semi-open extended formulas by adding the closed formulas (4.1.11) – (4.1.14), evaluated for the second and subsequent steps, to the As discussed (4.1.10). – rst step, (4.1.7) fi extrapolative open formulas for the immediately above, it is consistent to use an end step that is of one order lower Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer than the (repeated) interior step. The resulting formulas for an interval open at both ends are as follows: Equations (4.1.7) and (4.1.11) give [ ) ] ( ∫ x N 3 3 1 f f h + f ( x ) + f dx + f + + ··· = f O ) 4.1.15 + ( 2 N − 1 4 3 2 N − 2 N 2 2 x 1 Equations (4.1.8) and (4.1.12) give [ ∫ x N 23 7 f f + ) dx = h f f f + + + ( x 3 2 5 4 12 12 x 1 ] 23 7 4.1.16 ( ) f f f ··· + + + N 3 2 N − 1 − N − 12 12 ( ) 1 O + 3 N Equations (4.1.9) and (4.1.13) give [ ∫ x N 27 13 4 f f f + h f ) x +0+ dx = ( + 4 5 2 3 12 12 x 1 ] 4 27 13 ( 4.1.17 ) + ··· + +0+ f f f 4 N − 3 N − − 1 N 12 12 3 ) ( 1 O + 4 N The interior points alternate 4/3 and 2/3. If we want to avoid this alternation, we can combine equations (4.1.9) and (4.1.14), giving [ ∫ x N 55 11 1 f f f f = dx ) + x f − ( + f h + + f + 4 5 3 6 2 7 8 6 24 x 1 ] 1 55 11 f f f + ··· f + − + + f N N − N − 3 5 2 1 N N − − 4 − 24 6 8 ( ) 1 O + 4 N g of machine- isit website ( 4.1.18 ) ica). We should mention in passing another extended open formula, for use where the limits of integration are located halfway between tabulated abscissas. This one is extended midpoint rule , and is accurate to the same order as (4.1.15): known as the ∫ x N f ( x ) dx f h [ f + + f + = 2 3 5 / 2 / 2 7 / x 1 4.1.19 ( ) ( ) 1 f ··· + f O ]+ + 2 − 2 N 3 N / 1 / − 2 N

160 136 Integration of Functions Chapter 4. 1 N = 2 3 4 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. N (total after 4) = Figure 4.2.1. Sequential calls to the routine trapzd incorporate the information from previous calls and evaluate the integrand only at those new points necessary to refine the grid. The bottom line shows the totality of function evaluations after the fourth call. The routine qsimp , by weighting the intermediate results, transforms the trapezoid rule into Simpson’s rule with essentially no additional overhead. There are also formulas of higher order for this situation, but we will refrain from giving them. The semi-open formulas are just the obvious combinations of equations (4.1.11)– (4.1.14) with (4.1.15)–(4.1.18), respectively. At the closed end of the integration, use the weights from the former equations; at the open end use the weights from the latter equations. One example should give the idea, the formula with error term 3 1 /N decreasing as which is closed on the right and open on the left: [ ∫ x N 23 7 f f f ( x ) dx = h + f + f + + 2 3 4 5 12 12 x 1 4.1.20 ) ( ) ( ] 1 13 5 f f + + f + ··· + O 2 N N 1 − N − 3 N 12 12 CITED REFERENCES AND FURTHER READING: , Applied Mathe- Handbook of Mathematical Functions Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by § 25.4. [1] Dover Publications, New York), (New York: Wiley), Isaacson, E., and Keller, H.B. 1966, Analysis of Numerical Methods § 7.1. 4.2 Elementary Algorithms Our starting point is equation (4.1.11), the extended trapezoidal rule. There are g of machine- two facts about the trapezoidal rule which make it the starting point for a variety of isit website algorithms. One fact is rather obvious, while the second is rather “deep.” ica). The obvious fact is that, for a fixed function f ( x ) to be integrated between fixed limits and b , one can double the number of intervals in the extended trapezoidal a rule without losing the benefit of previous work. The coarsest implementation of the trapezoidal rule is to average the function at its endpoints a and b . The first stage of refinement is to add to this average the value of the function at the halfway point. The second stage of refinement is to add the values at the 1/4 and 3/4 points. And so on (see Figure 4.2.1). Without further ado we can write a routine with this kind of logic to it:

161 4.2 Elementary Algorithms 137 #define FUNC(x) ((*func)(x)) float trapzd(float (*func)(float), float a, float b, int n) This routine computes the n th stage of refinement of an extended trapezoidal rule. func is input as a pointer to the function to be integrated between limits a and b , also input. When called with ∫ b =2,3,... n ( . Subsequent calls with dx n =1, the routine returns the crudest estimate of ) x f a n-2 additional interior points. (in that sequential order) will improve the accuracy by adding 2 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v { float x,tnm,sum,del; static float s; int it,j; if (n == 1) { return (s=0.5*(b-a)*(FUNC(a)+FUNC(b))); } else { for (it=1,j=1;j #define EPS 1.0e-5 #define JMAX 20 float qtrap(float (*func)(float), float a, float b) Returns the integral of the function b can be set to the func from a to EPS . The parameters is the maximum allowed JMAX so that 2 to the power JMAX-1 desired fractional accuracy and number of steps. Integration is performed by the trapezoidal rule. { float trapzd(float (*func)(float), float a, float b, int n); void nrerror(char error_text[]); g of machine- int j; isit website float s,olds=0.0; Initial value of olds is arbitrary. ica). for (j=1;j<=JMAX;j++) { s=trapzd(func,a,b,j); if (j > 5) Avoid spurious early convergence. if (fabs(s-olds) < EPS*fabs(olds) || (s == 0.0 && olds == 0.0)) return s; olds=s; } nrerror("Too many steps in routine qtrap"); return 0.0; Never get here. }

162 138 Integration of Functions Chapter 4. Unsophisticated as it is, routine qtrap is in fact a fairly robust way of doing integrals of functions that are not very smooth. Increased sophistication will usually translate into a higher-order method whose efficiency will be greater only for sufficiently smooth integrands. qtrap is the method of choice, e.g., for an integrand which is a function of a variable that is linearly interpolated between measured data Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v EPS , however: If points. Be sure that you do not require too stringent an takes qtrap too many steps in trying to achieve your required accuracy, accumulated roundoff 6 − 10 errors may start increasing, and the routine may never converge. A value is just on the edge of trouble for most 32-bit machines; it is achievable when the convergence is moderately rapid, but not otherwise. We come now to the “deep” fact about the extended trapezoidal rule, equation (4.1.11). It is this: The error of the approximation, which begins with a term of 2 order 1 /N ,isinfact entirely even when expressed in powers of 1 /N . This follows directly from the , Euler-Maclaurin Summation Formula [ ] ∫ x N 1 1 f f ( h dx ) f x + f = + f + ··· + f + 3 N N − 1 1 2 2 2 x 1 4.2.1 ) ( k 2 2 B h h B 2 2 k (2 − k − k (2 1) 1) ′ ′ − ( f f ( ) ) −··· − −···− f − f N 1 1 N (2 k 2! )! B Here is a Bernoulli number , defined by the generating function k 2 ∞ n ∑ t t ) ( = 4.2.2 B n t e n ! 1 − =0 n with the first few even values (odd values vanish except for B = − 1 / 2 ) 1 1 1 1 B B =1 = = = B − B 4 0 2 6 6 30 42 ) 4.2.3 ( 5 1 691 B B − = = − = B 12 10 8 66 2730 30 Equation (4.2.1) is not a convergent expansion, but rather only an asymptotic expansion whose error when truncated at any point is always less than twice the magnitude of the first neglected term. The reason that it is not convergent is that the Bernoulli numbers become very large, e.g., g of machine- isit website ica). 495057205241079648212477525 = B 50 66 The key point is that only even powers of occur in the error series of (4.2.1). h This fact is not, in general, shared by the higher-order quadrature rules in § 4.1. 3 (1 /N For example, equation (4.1.12) has an error series beginning with O ,but ) 5 4 /N 1 , etc. , /N : N continuing with all subsequent powers of 1 Suppose we evaluate (4.1.11) with N steps, getting a result S , and then again N . (This is done by any two consecutive calls of with 2 N steps, getting a result S 2 N

163 4.2 Elementary Algorithms 139 trapzd .) The leading error term in the second evaluation will be 1/4 the size of the error in the first evaluation. Therefore the combination 1 4 S S ( − ) 4.2.4 = S N 2 N 3 3 readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin 3 will cancel out the leading order error term. But there no error term of order 1 /N is , 4 by (4.2.1). The surviving error is of order /N 1 , the same as Simpson’s rule. In fact, it should not take long for you to see that (4.2.4) is exactly Simpson’s rule (4.1.13), alternating 2/3’s, 4/3’s, and all. This is the preferred method for evaluating that rule, and we can write it as a routine exactly analogous to qtrap above: #include #define EPS 1.0e-6 #define JMAX 20 float qsimp(float (*func)(float), float a, float b) Returns the integral of the function EPS can be set to the . The parameters func from a to b JMAX-1 is the maximum allowed so that 2 to the power JMAX desired fractional accuracy and number of steps. Integration is performed by Simpson’s rule. { float trapzd(float (*func)(float), float a, float b, int n); void nrerror(char error_text[]); int j; float s,st,ost=0.0,os=0.0; for (j=1;j<=JMAX;j++) { st=trapzd(func,a,b,j); s=(4.0*st-ost)/3.0; Compare equation (4.2.4), above. if (j > 5) Avoid spurious early convergence. if (fabs(s-os) < EPS*fabs(os) || (s == 0.0 && os == 0.0)) return s; os=s; ost=st; } nrerror("Too many steps in routine qsimp"); return 0.0; Never get here. } (i.e., require The routine qtrap qsimp will in general be more efficient than fewer function evaluations) when the function to be integrated has a finite 4th derivative (i.e., a continuous 3rd derivative). The combination of qsimp and its necessary workhorse trapzd is a good one for light-duty work. g of machine- CITED REFERENCES AND FURTHER READING: isit website Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), ica). § 3.3. Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), §§ 7.4.1–7.4.2. Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical § 5.3. Computations (Englewood Cliffs, NJ: Prentice-Hall),

164 140 Chapter 4. Integration of Functions 4.3 Romberg Integration We can view Romberg’s method as the natural generalization of the routine qsimp in the last section to integration schemes that are of higher order than k Simpson’s rule. The basic idea is to use the results from successive refinements Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin of the extended trapezoidal rule (implemented in trapzd ) to remove all terms in k 2 . The routine ) qsimp is the case /N (1 the error series up to but not including O k =2 of . This is one example of a very general idea that goes by the name of Richardson’s deferred approach to the limit : Perform some numerical algorithm for various values of a parameter h , and then extrapolate the result to the continuum . =0 limit h Equation (4.2.4), which subtracts off the leading error term, is a special case of polynomial extrapolation. In the more general Romberg case, we can use Neville’s algorithm (see § 3.1) to extrapolate the successive refinements to zero stepsize. Neville’s algorithm can in fact be coded very concisely within a Romberg integration routine. For clarity of the program, however, it seems better to do the extrapolation § by function call to polint , already given in 3.1. #include #define EPS 1.0e-6 #define JMAX 20 #define JMAXP (JMAX+1) #define K 5 EPS is the fractional accuracy desired, as determined by the extrapolation error estimate; Here JMAX limits the total number of steps; K is the number of points used in the extrapolation. float qromb(float (*func)(float), float a, float b) Returns the integral of the function func from a to b . Integration is performed by Romberg’s method of order 2 , where, e.g., K K =2 is Simpson’s rule. { void polint(float xa[], float ya[], int n, float x, float *y, float *dy); float trapzd(float (*func)(float), float a, float b, int n); void nrerror(char error_text[]); float ss,dss; float s[JMAXP],h[JMAXP+1]; These store the successive trapezoidal approxi- int j; mations and their relative stepsizes. h[1]=1.0; for (j=1;j<=JMAX;j++) { s[j]=trapzd(func,a,b,j); if (j >= K) { polint(&h[j-K],&s[j-K],K,0.0,&ss,&dss); if (fabs(dss) <= EPS*fabs(ss)) return ss; } g of machine- isit website h[j+1]=0.25*h[j]; ica). This is a key step: The factor is 0.25 even though the stepsize is decreased by only 2 h 0.5. This makes the extrapolation a polynomial in as allowed by equation (4.2.1), not just a polynomial in h . } nrerror("Too many steps in routine qromb"); return 0.0; Never get here. } The routine qromb , along with its required trapzd and polint , is quite powerful for sufficiently smooth (e.g., analytic) integrands, integrated over intervals

165 4.4 Improper Integrals 141 , qromb which contain no singularities, and where the endpoints are also nonsingular. in such circumstances, takes many, many fewer function evaluations than either of § the routines in 4.2. For example, the integral ∫ 2 √ 4 2 log( x + x x dx +1) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 0 converges (with parameters as shown above) on the very first extrapolation, after trapzd just 5 calls to qsimp requires 8 calls (8 times as many evaluations of , while the integrand) and requires 13 calls (making 256 times as many evaluations qtrap of the integrand). CITED REFERENCES AND FURTHER READING: Introduction to Numerical Analysis (New York: Springer-Verlag), Stoer, J., and Bulirsch, R. 1980, §§ 3.4–3.5. Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), 7.4.1–7.4.2. §§ A First Course in Numerical Analysis Ralston, A., and Rabinowitz, P. 1978, , 2nd ed. (New York: § 4.10–2. McGraw-Hill), 4.4 Improper Integrals For our present purposes, an integral will be “improper” if it has any of the following problems: • its integrand goes to a finite limiting value at finite upper and lower limits, x but cannot be evaluated =0 ) right on one of those limits (e.g., sin x/x at ∞ , or its lower limit is −∞ its upper limit is • 1 − / 2 ) =0 x at x it has an integrable singularity at either limit (e.g., • • it has an integrable singularity at a known place between its upper and lower limits • it has an integrable singularity at an unknown place between its upper and lower limits ∫ ∞ 1 − ), or does not exist in a limiting sense dx x If an integral is infinite (e.g., 1 ∫ ∞ ), we do not call it improper; we call it impossible. No amount of cos xdx (e.g., −∞ clever algorithmics will return a meaningful answer to an ill-posed problem. g of machine- In this section we will generalize the techniques of the preceding two sections isit website to cover the first four problems on the above list. A more advanced discussion of ica). quadrature with integrable singularities occurs in Chapter 18, notably 18.3. The § fifth problem, singularity at unknown location, can really only be handled by the use of a variable stepsize differential equation integration routine, as will be given in Chapter 16. We need a workhorse like the extended trapezoidal rule (equation 4.1.11), but one which is an open formula in the sense of § 4.1, i.e., does not require the integrand to be evaluated at the endpoints. Equation (4.1.19), the extended midpoint rule, is the best choice. The reason is that (4.1.19) shares with (4.1.11) the “deep” property of

166 142 Integration of Functions Chapter 4. h . Indeed there is a formula, not as well having an error series that is entirely even in known as it ought to be, called the Second Euler-Maclaurin summation formula , ∫ x N ) + f ( ] f dx = h [ f + f + f + ··· + f x 2 2 / 5 / 2 1 − 7 / 2 2 N 3 3 N − / / x 1 2 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. B h 2 ′ ′ ( 4.4.1 ) + ( f − f )+ ··· 1 N 4 2 k h B 2 k 1) − k (2 1) − k (2 +1 k 2 − 2 (1 − )( f )+ − f ··· + 1 N k )! (2 h This equation can be derived by writing out (4.2.1) with stepsize , then writing it out again with stepsize h/ 2 , then subtracting the first from twice the second. It is not possible to double the number of steps in the extended midpoint rule and still have the benefit of previous function evaluations (try it!). However, it is possible to the number of steps and do so. Shall we do this, or double and triple √ 3 of unnecessary work, accept the loss? On the average, tripling does a factor since the “right” number of steps for a desired accuracy criterion may in fact fall anywhere in the logarithmic interval implied by tripling. For doubling, the factor √ , but we lose an extra factor of 2 in being unable to use all the previous 2 is only evaluations. Since 1 . 732 < 2 × 1 . 414 , it is better to triple. Here is the resulting routine, which is directly comparable to . trapzd #define FUNC(x) ((*func)(x)) float midpnt(float (*func)(float), float a, float b, int n) This routine computes the n th stage of refinement of an extended midpoint rule. func is input as a pointer to the function to be integrated between limits and , also input. When called with b a ∫ b x =2,3,... n f ( =1, the routine returns the crudest estimate of ) dx . Subsequent calls with n a n-1 (in that sequential order) will improve the accuracy of s by adding (2 / 3) additional 3 × interior points. should not be modified between sequential calls. s { float x,tnm,sum,del,ddel; static float s; int it,j; if (n == 1) { return (s=(b-a)*FUNC(0.5*(a+b))); } else { for(it=1,j=1;j

167 4.4 Improper Integrals 143 qtrap in a driver routine like trapzd can exactly replace midpnt The routine trapzd(func,a,b,j) , and midpnt(func,a,b, j) ( § 4.2); one simply changes to 1 − JMAX perhaps also decreases the parameter JMAX since 3 (from step tripling) is a JMAX 1 − (step doubling). much larger number than 2 The open formula implementation analogous to Simpson’s rule ( qsimp in § 4.2) and decreases as above, but now also changes trapzd for midpnt JMAX substitutes Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) the extrapolation step to be s=(9.0*st-ost)/8.0; th its size, not since, when the number of steps is tripled, the error decreases to 1 / 9 1 / 4 th as with step doubling. Either the modified qtrap or the modified qsimp will fix the first problem on the list at the beginning of this section. Yet more sophisticated is to generalize Romberg integration in like manner: #include #define EPS 1.0e-6 #define JMAX 14 #define JMAXP (JMAX+1) #define K 5 float qromo(float (*func)(float), float a, float b, float (*choose)(float(*)(float), float, float, int)) Romberg integration on an open interval. Returns the integral of the function b a to func , from will choose and Romberg’s method. Normally choose using any specified integrating function be an open formula, not evaluating the function at the endpoints. It is assumed that choose triples the number of steps on each call, and that its error series contains only even powers of midsqu , are possible midpnt , midinf , midsql , , midexp the number of steps. The routines choices for qromb . choose . The parameters have the same meaning as in { void polint(float xa[], float ya[], int n, float x, float *y, float *dy); void nrerror(char error_text[]); int j; float ss,dss,h[JMAXP+1],s[JMAXP]; h[1]=1.0; for (j=1;j<=JMAX;j++) { s[j]=(*choose)(func,a,b,j); if (j >= K) { polint(&h[j-K],&s[j-K],K,0.0,&ss,&dss); if (fabs(dss) <= EPS*fabs(ss)) return ss; } h[j+1]=h[j]/9.0; This is where the assumption of step tripling and an even error series is used. } nrerror("Too many steps in routing qromo"); Never get here. return 0.0; g of machine- isit website } ica). Don’t be put off by ’s complicated ANSI declaration. A typical invocation qromo from 0 to 2) is simply ) x ( Y (integrating the Bessel function 0 #include "nr.h" float answer; ... answer=qromo(bessy0,0.0,2.0,midpnt);

168 144 Integration of Functions Chapter 4. § ( 4.3) are so slight that it is perhaps qromb The differences between qromo and gratuitous to list qromo in full. It, however, is an excellent driver routine for solving all the other problems of improper integrals in our first list (except the intractable fifth), as we shall now see. The basic trick for improper integrals is to make a change of variables to eliminate the singularity, or to map an infinite range of integration to a finite one. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. For example, the identity ( ) ∫ ∫ /a 1 b 1 1 f ( dx f x ) = ) 4.4.2 0( dtab> 2 t t 1 /b a can be used with either b →∞ and a positive, or with a →−∞ and b negative, and 2 . works for any function which decreases towards infinity faster than 1 /x You can make the change of variable implied by (4.4.2) either analytically and or to do the numerical evaluation, you can let then use (e.g.) qromo and midpnt the numerical algorithm make the change of variable for you. We prefer the latter method as being more transparent to the user. To implement equation (4.4.2) we , called midinf , which allows midpnt simply write a modified version of to be b infinite (or, more precisely, a very large number on your particular machine, such 30 ), or a to be negative and infinite. as 1 × 10 #define FUNC(x) ((*funk)(1.0/(x))/((x)*(x))) Effects the change of variable. float midinf(float (*funk)(float), float aa, float bb, int n) This routine is an exact replacement for midpnt , i.e., returns the n th stage of refinement of , except that the function is evaluated at evenly spaced to funk from aa bb the integral of rather than in . This allows the upper limit points in 1 /x x bb to be as large and positive as the computer allows, or the lower limit aa aa and to be as large and negative, but not both. bb must have the same sign. { float x,tnm,sum,del,ddel,b,a; static float s; int it,j; b=1.0/aa; These two statements change the limits of integration. a=1.0/bb; From this point on, the routine is identical to midpnt . if (n == 1) { return (s=(b-a)*FUNC(0.5*(a+b))); } else { for(it=1,j=1;j

169 4.4 Improper Integrals 145 If you need to integrate from a negative lower limit to positive infinity, you do this by breaking the integral into two pieces at some positive value, for example, answer=qromo(funk,-5.0,2.0,midpnt)+qromo(funk,2.0,1.0e30,midinf); Where should you choose the breakpoint? At a sufficiently large positive value so Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. is at least beginning to approach its asymptotic decrease to funk that the function zero value at infinity. The polynomial extrapolation implicit in the second call to /x , not in qromo . 1 deals with a polynomial in x To deal with an integral that has an integrable power-law singularity at its lower γ − , a − x ( limit, one also makes a change of variable. If the integrand diverges as ) , use the identity γ< , near ≤ = a 1 0 x γ − 1 ∫ ∫ b ( b − a ) γ 1 1 1 − γ γ 1 − a b>a t ) = ) 4.4.3 f ( t x ( f )( + dx ) dt ( 1 − γ a 0 If the singularity is at the upper limit, use the identity 1 − γ ∫ ∫ − a ) b ( b γ 1 1 γ 1 − γ 1 − f ( x ) dx = 4.4.4 ( dt ) ) f ( b − t )( t b>a − γ 1 a 0 If there is a singularity at both limits, divide the integral at an interior breakpoint as in the example above. Equations (4.4.3) and (4.4.4) are particularly simple in the case of inverse square-root singularities, a case that occurs frequently in practice: √ ∫ ∫ b − a b 2 tf f ( x ) ) 2 dx ( a + t = ) dt ( b>a )( 4.4.5 a 0 a for a singularity at , and √ ∫ ∫ b b − a 2 4.4.6 ) dx = )( x ) b>a f ( 2 tf ( b − t ( ) dt 0 a for a singularity at b . Once again, we can implement these changes of variable which make the midpnt transparently to the user by defining substitute routines for change of variable automatically: #include #define FUNC(x) (2.0*(x)*(*funk)(aa+(x)*(x))) float midsql(float (*funk)(float), float aa, float bb, int n) g of machine- isit website , except that it allows for an inverse square-root midpnt This routine is an exact replacement for singularity in the integrand at the lower limit aa . ica). { float x,tnm,sum,del,ddel,a,b; static float s; int it,j; b=sqrt(bb-aa); a=0.0; if (n == 1) { The rest of the routine is exactly like midpnt and is omitted.

170 Integration of Functions 146 Chapter 4. Similarly, #include #define FUNC(x) (2.0*(x)*(*funk)(bb-(x)*(x))) float midsqu(float (*funk)(float), float aa, float bb, int n) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) This routine is an exact replacement for midpnt , except that it allows for an inverse square-root . bb singularity in the integrand at the upper limit { float x,tnm,sum,del,ddel,a,b; static float s; int it,j; b=sqrt(bb-aa); a=0.0; if (n == 1) { The rest of the routine is exactly like midpnt and is omitted. One last example should suffice to show how these formulas are derived in general. Suppose the upper limit of integration is infinite, and the integrand falls off − x ± (with dt dx into ( ) e exponentially. Then we want a change of variable that maps the sign chosen to keep the upper limit of the new variable larger than the lower limit). Doing the integration gives by inspection − x t = e ) 4.4.7 or x = − log t ( so that a − ∫ ∫ = ∞ e x t = dt ( 4.4.8 ) = ( x ) dx − f ) f ( t log t a = x t =0 The user-transparent implementation would be #include #define FUNC(x) ((*funk)(-log(x))/(x)) float midexp(float (*funk)(float), float aa, float bb, int n) midpnt is assumed to be infinite , except that bb This routine is an exact replacement for (value passed not actually used). It is assumed that the function funk decreases exponentially rapidly at infinity. { float x,tnm,sum,del,ddel,a,b; static float s; g of machine- int it,j; isit website ica). b=exp(-aa); a=0.0; if (n == 1) { The rest of the routine is exactly like midpnt and is omitted. CITED REFERENCES AND FURTHER READING: Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), Chapter 4.

171 4.5 Gaussian Quadratures and Orthogonal Polynomials 147 Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), § 7.4.3, p. 294. Introduction to Numerical Analysis Stoer, J., and Bulirsch, R. 1980, (New York: Springer-Verlag), § 3.7, p. 152. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 4.5 Gaussian Quadratures and Orthogonal Polynomials In the formulas of § 4.1, the integral of a function was approximated by the sum of its functional values at a set of equally spaced points, multiplied by certain aptly chosen weighting coefficients. We saw that as we allowed ourselves more freedom in choosing the coefficients, we could achieve integration formulas of higher and Gaussian quadratures is to give ourselves the freedom to higher order. The idea of choose not only the weighting coefficients, but also the location of the abscissas at which the function is to be evaluated: They will no longer be equally spaced. Thus, we will have twice the number of degrees of freedom at our disposal; it will turn out that we can achieve Gaussian quadrature formulas whose order is, essentially, twice that of the Newton-Cotes formula with the same number of function evaluations. Does this sound too good to be true? Well, in a sense it is. The catch is a familiar one, which cannot be overemphasized: High order is not the same as high accuracy. High order translates to high accuracy only when the integrand is very smooth, in the sense of being “well-approximated by a polynomial.” There is, however, one additional feature of Gaussian quadrature formulas that adds to their usefulness: We can arrange the choice of weights and abscissas to make the integral exact for a class of integrands “polynomials times some known function x ) ” rather than for the usual class of integrands “polynomials.” The function W ( x ) can then be chosen to remove integrable singularities from the desired integral. ( W W ( x ) , in other words, and given an integer N , we can find a set of weights Given and abscissas x such that the approximation w j j ∫ N b ∑ W x 4.5.1 ( x ) f ( x ) dx ≈ ) )( ( f w j j a =1 j is a polynomial. For example, to do the integral f ( x ) is exact if ∫ 1 2 exp( − cos x ) g of machine- isit website √ dx ) 4.5.2 ( 2 − 1 x 1 − ica). (not a very natural looking integral, it must be admitted), we might well be interested in a Gaussian quadrature formula based on the choice 1 √ 4.5.3 ) ( )= x W ( 2 − x 1 in the interval ( − 1 , 1) . (This particular choice is called Gauss-Chebyshev integration , for reasons that will become clear shortly.)

172 148 Chapter 4. Integration of Functions Notice that the integration formula (4.5.1) can also be written with the weight ( ) x ( f v function W ( x ) not overtly visible: Define g and x ) ≡ W ( x ) /W ( x ) . ≡ w j j j Then (4.5.1) becomes ∫ N b ∑ ( x ) g g ( x ) dx ≈ 4.5.4 )( v j j readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin a =1 j Where did the function W ( x ) go? It is lurking there, ready to give high-order accuracy to integrands of the form polynomials times W ( x ) , and ready to deny high- order accuracy to integrands that are otherwise perfectly smooth and well-behaved. x , you have When you find tabulations of the weights and abscissas for a given W ( ) to determine carefully whether they are to be used with a formula in the form of (4.5.1), or like (4.5.4). Here is an example of a quadrature routine that contains the tabulated abscissas ( x )=1 and N =10 . Since the weights and abscissas W and weights for the case are, in this case, symmetric around the midpoint of the range of integration, there are actually only five distinct values of each: float qgaus(float (*func)(float), float a, float b) func between a and b , by ten-point Gauss-Legendre inte- Returns the integral of the function gration: the function isevaluatedexactly tentimesatinteriorpoints intherangeofintegration. { int j; float xr,xm,dx,s; static float x[]={0.0,0.1488743389,0.4333953941, The abscissas and weights. 0.6794095682,0.8650633666,0.9739065285}; Firstvalueofeacharray not used. static float w[]={0.0,0.2955242247,0.2692667193, 0.2190863625,0.1494513491,0.0666713443}; xm=0.5*(b+a); xr=0.5*(b-a); s=0; Willbe twice the average value of the function, since the tenweights(fivenumbersaboveeachusedtwice) for (j=1;j<=5;j++) { sum to 2. dx=xr*x[j]; s += w[j]*((*func)(xm+dx)+(*func)(xm-dx)); } Scale the answer to the range of integration. return s *= xr; } The above routine illustrates that one can use Gaussian quadratures without necessarily understanding the theory behind them: One just locates tabulated weights [2] [1] or ). However, the theory is very pretty, and it and abscissas in a book (e.g., g of machine- will come in handy if you ever need to construct your own tabulation of weights and isit website abscissas for an unusual choice of W ( x ) . We will therefore give, without any proofs, ica). some useful results that will enable you to do this. Several of the results assume that W x ) does not change sign inside ( ( ) , which is usually the case in practice. a,b The theory behind Gaussian quadratures goes back to Gauss in 1814, who used continued fractions to develop the subject. In 1826 Jacobi rederived Gauss’s results by means of orthogonal polynomials. The systematic treatment of arbitrary weight functions W ( x ) using orthogonal polynomials is largely due to Christoffel in 1877. To introduce these orthogonal polynomials, let us fix the interval of interest to be ( a,b ) . We can define the “scalar product of two functions f and g over a

173 4.5 Gaussian Quadratures and Orthogonal Polynomials 149 weight function ”as W ∫ b g 〉≡ 〈 f | x W ( ) 4.5.5 f ( x ) g ( x ) dx ( ) a . Two functions are said to be The scalar product is a number, not a function of x Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v normalized if their scalar product is zero. A function is said to be orthogonal if its scalar product with itself is unity. A set of functions that are all mutually orthogonal orthonormal and also all individually normalized is called an set. We can find a set of polynomials (i) that includes exactly one polynomial of ,... 2 , ( x ) , for each j =0 , and (ii) all of which are mutually 1 , , called j order p j W x ) . A constructive procedure for orthogonal over the specified weight function ( finding such a set is the recurrence relation ≡ 0 ) ( x p 1 − p ≡ ( x ) 1 0 ( 4.5.6 ) p =0 ( x )=( x − a ) ) p ,... ( x ) − b 2 p , 1 , ( x j j 1 j j j j +1 − where 〉 | p 〈 xp j j ,... 1 , =0 j = a j | 〉 p 〈 p j j ( ) 4.5.7 〉 p | 〈 p j j b ,... 2 , =1 j = j | 〉 p 〈 p − j 1 j 1 − is arbitrary; we can take it to be zero. b The coefficient 0 The polynomials defined by (4.5.6) are monic , i.e., the coefficient of their j leading term [ x ) by the constant for p x ( x ) ] is unity. If we divide each p ( j j 2 / 1 〉 we can render the set of polynomials orthonormal. One also encounters | p ] [ p 〈 j j orthogonal polynomials with various other normalizations. You can convert from a given normalization to monic polynomials if you know that the coefficient of j is in p λ , say; then the monic polynomials are obtained by dividing each p x j j j . Note that the coefficients in the recurrence relation (4.5.6) depend on the λ by j adopted normalization. j distinct roots in the can be shown to have exactly ( x ) The polynomial p j interval ( a,b ) . Moreover, it can be shown that the roots of p x “interleave” the ( ) j , i.e., there is exactly one root of the former in between each x ( ) roots of 1 − j p j − 1 g of machine- two adjacent roots of the latter. This fact comes in handy if you need to find all the isit website and then, in turn, bracket the roots ) ( x p roots: You can start with the one root of ica). 1 j of each higher , pinning them down at each stage more precisely by Newton’s rule or some other root-finding scheme (see Chapter 9). Why would you ever want to find all the roots of an orthogonal polynomial ( -point Gaussian quadrature formulas (4.5.1) x ) ? Because the abscissas of the N p j and (4.5.4) with weighting function W ( x ) in the interval ( a,b ) are precisely the roots ( x ) for the same interval and weighting function. of the orthogonal polynomial p N This is the fundamental theorem of Gaussian quadratures, and lets you find the abscissas for any particular case.

174 150 Chapter 4. Integration of Functions , you need to find the weights Once you know the abscissas x ,...,x , w N 1 j j =1 ,...,N . One way to do this (not the most efficient) is to solve the set of linear equations ∫       b x ( x ) ) ...p ( p w 0 N 1 0 1 ( ( x ) ) dx p W x 0 a ) ) ( x ( ...p x p w       1 1 N 1 2 0 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v       = 4.5.8 ) ( . . . .       . . . . . . . . w 0 ( ) ...p ( x ) x p N 1 N − N 1 − N 1 Equation (4.5.8) simply solves for those weights such that the quadrature (4.5.1) gives the correct answer for the integral of the first N orthogonal polynomials. Note that the zeros on the right-hand side of (4.5.8) appear because p x ) ( x ) ,...,p ( 1 N − 1 ) , which is a constant. It can be shown that, with those ( x p are all orthogonal to 0 next − 1 polynomials is also exact, so that the quadrature weights, the integral of the N N − 1 or less. Another way to evaluate the 2 is exact for all polynomials of degree weights (though one whose proof is beyond our scope) is by the formula 〈 p p 〉 | 1 − 1 − N N ) 4.5.9 ( = w j ′ x ( p ) p ) x ( j 1 − N j N ′ where p ( . x x ) is the derivative of the orthogonal polynomial at its zero j j N The computation of Gaussian quadrature rules thus involves two distinct phases: p (i) the generation of the orthogonal polynomials ,...,p , i.e., the computation of 0 N , b in (4.5.6); (ii) the determination of the zeros of p ( x ) , and a the coefficients j N j the computation of the associated weights. For the case of the “classical” orthogonal and are explicitly known (equations 4.5.10 – b polynomials, the coefficients a j j 4.5.14 below) and phase (i) can be omitted. However, if you are confronted with a , and you don’t know the coefficients “nonclassical” weight function W ( x ) a and j , the construction of the associated set of orthogonal polynomials is not trivial. b j We discuss it at the end of this section. Computation of the Abscissas and Weights This task can range from easy to difficult, depending on how much you already know about your weight function and its associated polynomials. In the case of classical, well-studied, orthogonal polynomials, practically everything is known, including good approximations for their zeros. These can be used as starting guesses, g of machine- isit website enabling Newton’s method (to be discussed in § 9.4) to converge very rapidly. ica). ′ ( x ) , which is evaluated by standard Newton’s method requires the derivative p N relations in terms of p . The weights are then conveniently evaluated by and p N − 1 N equation (4.5.9). For the following named cases, this direct root-finding is faster, by a factor of 3 to 5, than any other method. Here are the weight functions, intervals, and recurrence relations that generate the most commonly used orthogonal polynomials and their corresponding Gaussian quadrature formulas.

175 4.5 Gaussian Quadratures and Orthogonal Polynomials 151 Gauss-Legendre: 1 )=1 − 1

176 152 Chapter 4. Integration of Functions #include #define EPS 3.0e-11 EPS is the relative precision. void gauleg(float x1, float x2, float x[], float w[], int n) Given the lower and upper limits of integration n x2 ,andgiven , this routine returns and x1 n w[1..n] x[1..n] and of length , containing the abscissas and weights of the Gauss- arrays Legendre n -point quadrature formula. { readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer int m,j,i; double z1,z,xm,xl,pp,p3,p2,p1; High precision is a good idea for this rou- tine. The roots are symmetric in the interval, so m=(n+1)/2; we only have to find half of them. xm=0.5*(x2+x1); xl=0.5*(x2-x1); for (i=1;i<=m;i++) { Loop over the desired roots. z=cos(3.141592654*(i-0.25)/(n+0.5)); Starting with the above approximation to the i th root, we enter the main loop of refinement by Newton’s method. do { p1=1.0; p2=0.0; for (j=1;j<=n;j++) { Loop up the recurrence relation to get the . p3=p2; z Legendre polynomial evaluated at p2=p1; p1=((2.0*j-1.0)*z*p2-(j-1.0)*p3)/j; } p1 is now the desired Legendre polynomial. We next compute pp , its derivative, , the polynomial of one lower order. by a standard relation involving also p2 pp=n*(z*p1-p2)/(z*z-1.0); z1=z; Newton’s method. z=z1-p1/pp; } while (fabs(z-z1) > EPS); x[i]=xm-xl*z; Scale the root to the desired interval, and put in its symmetric counterpart. x[n+1-i]=xm+xl*z; w[i]=2.0*xl/((1.0-z*z)*pp*pp); Compute the weight w[n+1-i]=w[i]; and its symmetric counterpart. } } Next we give three routines that use initial approximations for the roots given [2] by Stroud and Secrest . The first is for Gauss-Laguerre abscissas and weights, to be used with the integration formula ∫ N ∞ ∑ − α x x 4.5.18 f ( x ) dx = ( e )( w x f ) j j 0 =1 j g of machine- #include isit website #define EPS 3.0e-14 Increase EPS if you don’t have this preci- ica). sion. #define MAXIT 10 void gaulag(float x[], float w[], int n, float alf) alf , the parameter α of the Laguerre polynomials, this routine returns arrays x[1..n] Given and w[1..n] containing the abscissas and weights of the n -point Gauss-Laguerre quadrature formula. The smallest abscissa is returned in x[1] ,thelargestin x[n] . { float gammln(float xx); void nrerror(char error_text[]); int i,its,j; float ai;

177 4.5 Gaussian Quadratures and Orthogonal Polynomials 153 High precision is a good idea for this rou- double p1,p2,p3,pp,z,z1; tine. Loop over the desired roots. for (i=1;i<=n;i++) { if (i == 1) { Initial guess for the smallest root. z=(1.0+alf)*(3.0+0.92*alf)/(1.0+2.4*n+1.8*alf); } else if (i == 2) { Initial guess for the second root. z += (15.0+6.25*alf)/(1.0+0.9*alf+2.5*n); } else { Initial guess for the other roots. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer ai=i-2; z += ((1.0+2.55*ai)/(1.9*ai)+1.26*ai*alf/ (1.0+3.5*ai))*(z-x[i-2])/(1.0+0.3*alf); } for (its=1;its<=MAXIT;its++) { Refinement by Newton’s method. p1=1.0; p2=0.0; for (j=1;j<=n;j++) { Loop up the recurrence relation to get the z . Laguerre polynomial evaluated at p3=p2; p2=p1; p1=((2*j-1+alf-z)*p2-(j-1+alf)*p3)/j; } p1 is now the desired Laguerre polynomial. We next compute pp , its derivative, , the polynomial of one lower order. by a standard relation involving also p2 pp=(n*p1-(n+alf)*p2)/z; z1=z; Newton’s formula. z=z1-p1/pp; if (fabs(z-z1) <= EPS) break; } if (its > MAXIT) nrerror("too many iterations in gaulag"); x[i]=z; Store the root and the weight. w[i] = -exp(gammln(alf+n)-gammln((float)n))/(pp*n*p2); } } Next is a routine for Gauss-Hermite abscissas and weights. If we use the “standard” normalization of these functions, as given in equation (4.5.13), we find that the computations overflow for large N because of various factorials that occur. ̃ . They We can avoid this by using instead the orthonormal set of polynomials H j are generated by the recurrence √ √ 1 2 j ̃ ̃ ̃ ̃ ̃ H H H H ) = x = =0 , , 4.5.19 H − ( j 0 − 1 1 +1 j − j 4 / 1 +1 j +1 j π The formula for the weights becomes 2 w 4.5.20 = ) ( j ′ 2 ̃ [ ( x H )] j N g of machine- isit website while the formula for the derivative with this normalization is ica). √ ′ ̃ ̃ H j 2 ) = H 4.5.21 ( 1 j − j The abscissas and weights returned by gauher are used with the integration formula ∫ N ∞ ∑ 2 x − f ( ) ) dx = 4.5.22 e )( x w ( f x j j −∞ =1 j

178 154 Chapter 4. Integration of Functions #include #define EPS 3.0e-14 Relative precision. 4 / 1 . #define PIM4 0.7511255444649425 /π 1 #define MAXIT 10 Maximum iterations. void gauher(float x[], float w[], int n) Given containing the abscissasandweights w[1..n] n ,thisroutinereturns arrays x[1..n] and of the -point Gauss-Hermite quadrature formula. The largest abscissa is returned in ,the n x[1] http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v most negative in x[n] . { void nrerror(char error_text[]); int i,its,j,m; double p1,p2,p3,pp,z,z1; Highprecisionisagoodideaforthisrou- tine. m=(n+1)/2; The roots are symmetric about the origin, so we have to find only half of them. for (i=1;i<=m;i++) { Loop over the desired roots. if (i == 1) { Initial guess for the largest root. z=sqrt((double)(2*n+1))-1.85575*pow((double)(2*n+1),-0.16667); } else if (i == 2) { Initialguessforthesecondlargestroot. z -= 1.14*pow((double)n,0.426)/z; } else if (i == 3) { Initial guess for the third largest root. z=1.86*z-0.86*x[1]; } else if (i == 4) { Initial guess for the fourth largest root. z=1.91*z-0.91*x[2]; } else { Initial guess for the other roots. z=2.0*z-x[i-2]; } Refinement by Newton’s method. for (its=1;its<=MAXIT;its++) { p1=PIM4; p2=0.0; for (j=1;j<=n;j++) { Loop up the recurrence relation to get theHermitepolynomialevaluatedat p3=p2; z . p2=p1; p1=z*sqrt(2.0/j)*p2-sqrt(((double)(j-1))/j)*p3; } p1 is now the desired Hermite polynomial. We next compute pp , its derivative, by the relation (4.5.21) using p2 , the polynomial of one lower order. pp=sqrt((double)2*n)*p2; z1=z; Newton’s formula. z=z1-p1/pp; if (fabs(z-z1) <= EPS) break; } if (its > MAXIT) nrerror("too many iterations in gauher"); x[i]=z; Store the root x[n+1-i] = -z; and its symmetric counterpart. w[i]=2.0/(pp*pp); Compute the weight and its symmetric counterpart. w[n+1-i]=w[i]; } } g of machine- isit website Finally, here is a routine for Gauss-Jacobi abscissas and weights, which ica). implement the integration formula ∫ N 1 ∑ α β x ) ) (1 + x ) 4.5.23 f ( )( ) dx = (1 − x ( w f x j j 1 − =1 j

179 4.5 Gaussian Quadratures and Orthogonal Polynomials 155 #include #define EPS 3.0e-14 Increase EPS if you don’t have this preci- sion. #define MAXIT 10 void gaujac(float x[], float w[], int n, float alf, float bet) alf and bet , the parameters α and β of the Jacobi polynomials, this routine returns Given arrays -pointGauss-Jacobi containing the abscissasand weights ofthe n w[1..n] and x[1..n] quadrature formula. The largest abscissa is returned in x[n] x[1] , the smallest in . http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. { float gammln(float xx); void nrerror(char error_text[]); int i,its,j; float alfbet,an,bn,r1,r2,r3; double a,b,c,p1,p2,p3,pp,temp,z,z1; High precision is a good idea for this rou- tine. Loop over the desired roots. for (i=1;i<=n;i++) { if (i == 1) { Initial guess for the largest root. an=alf/n; bn=bet/n; r1=(1.0+alf)*(2.78/(4.0+n*n)+0.768*an/n); r2=1.0+1.48*an+0.96*bn+0.452*an*an+0.83*an*bn; z=1.0-r1/r2; } else if (i == 2) { Initial guess for the second largest root. r1=(4.1+alf)/((1.0+alf)*(1.0+0.156*alf)); r2=1.0+0.06*(n-8.0)*(1.0+0.12*alf)/n; r3=1.0+0.012*bet*(1.0+0.25*fabs(alf))/n; z -= (1.0-z)*r1*r2*r3; Initial guess for the third largest root. } else if (i == 3) { r1=(1.67+0.28*alf)/(1.0+0.37*alf); r2=1.0+0.22*(n-8.0)/n; r3=1.0+8.0*bet/((6.28+bet)*n*n); z -= (x[1]-z)*r1*r2*r3; Initial guess for the second smallest root. } else if (i == n-1) { r1=(1.0+0.235*bet)/(0.766+0.119*bet); r2=1.0/(1.0+0.639*(n-4.0)/(1.0+0.71*(n-4.0))); r3=1.0/(1.0+20.0*alf/((7.5+alf)*n*n)); z += (z-x[n-3])*r1*r2*r3; Initial guess for the smallest root. } else if (i == n) { r1=(1.0+0.37*bet)/(1.67+0.28*bet); r2=1.0/(1.0+0.22*(n-8.0)/n); r3=1.0/(1.0+8.0*alf/((6.28+alf)*n*n)); z += (z-x[n-2])*r1*r2*r3; } else { Initial guess for the other roots. z=3.0*x[i-1]-3.0*x[i-2]+x[i-3]; } alfbet=alf+bet; for (its=1;its<=MAXIT;its++) { Refinement by Newton’s method. Starttherecurrencewith temp=2.0+alfbet; P and P toavoid 0 1 α p1=(alf-bet+temp*z)/2.0; a division by zero when or + β =0 − . p2=1.0; 1 for (j=2;j<=n;j++) { Loop up the recurrence relation to get the g of machine- Jacobi polynomial evaluated at z . p3=p2; isit website p2=p1; ica). temp=2*j+alfbet; a=2*j*(j+alfbet)*(temp-2.0); b=(temp-1.0)*(alf*alf-bet*bet+temp*(temp-2.0)*z); c=2.0*(j-1+alf)*(j-1+bet)*temp; p1=(b*p2-c*p3)/a; } pp=(n*(alf-bet-temp*z)*p1+2.0*(n+alf)*(n+bet)*p2)/(temp*(1.0-z*z)); p1 is now the desired Jacobi polynomial. We next compute pp , its derivative, by a standard relation involving also p2 , the polynomial of one lower order. z1=z; z=z1-p1/pp; Newton’s formula.

180 156 Chapter 4. Integration of Functions if (fabs(z-z1) <= EPS) break; } if (its > MAXIT) nrerror("too many iterations in gaujac"); x[i]=z; Store the root and the weight. w[i]=exp(gammln(alf+n)+gammln(bet+n)-gammln(n+1.0)- gammln(n+alfbet+1.0))*temp*pow(2.0,alfbet)/(pp*p2); } } http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v = α =0 , Legendre polynomials are special cases of Jacobi polynomials with β but it is worth having the separate routine for them, gauleg , given above. Chebyshev α = β = − 1 / 2 (see § 5.8). They have analytic abscissas polynomials correspond to and weights: ( ) 1 ( j − π ) 2 =cos x j N ) 4.5.24 ( π w = j N Case of Known Recurrences Turn now to the case where you do not know good initial guesses for the zeros of your a orthogonal polynomials, but you do have available the coefficients and b that generate j j them. As we have seen, the zeros of p ( x ) are the abscissas for the N -point Gaussian N quadrature formula. The most useful computational formula for the weights is equation ′ (4.5.9) above, since the derivative p can be efficiently computed by the derivative of (4.5.6) N in the general case, or by special relations for the classical polynomials. Note that (4.5.9) is valid as written only for monic polynomials; for other normalizations, there is an extra factor N of λ /λ λ is the coefficient of x . in p , where N 1 N N N − Except in those special cases already discussed, the best way to find the abscissas is not to use a root-finding method like Newton’s method on p ( . Rather, it is generally faster x ) N [4] [3] to use the Golub-Welsch algorithm, which is based on a result of Wilf . This algorithm notes that if you bring the term xp to the left-hand side of (4.5.6) and the term p to the +1 j j right-hand side, the recurrence relation can be written in matrix form as         p p a 1 0 0 0 0 p p b a 1         0 1 1 1 1         . . . . .         . . . . . = · + x .         . . . .         0 a 1 b p p 2 N N − 2 − N − − N 2 2 p b a p p N − N 1 1 − N 1 − N − 1 N or p x p = T · p + ) 4.5.25 e ( − 1 N N g of machine- isit website e is a unit ,p , and ,...,p p T is a tridiagonal matrix, p Here is a column vector of N 1 1 0 N − 1 − ( N − 1) st (last) position and zeros elsewhere. The matrix T can be vector with a 1 in the ica). symmetrized by a diagonal similarity transformation D to give √   b a 1 0 √ √ a b b   2 1 1   − 1 . .   ) 4.5.26 ( DTD = J = . .   . . √ √   b b a N N − 2 − 2 N − 1 √ b a 1 1 − N − N The matrix J is called the Jacobi matrix (not to be confused with other matrices named after Jacobi that arise in completely different problems!). Now we see from (4.5.25) that

181 4.5 Gaussian Quadratures and Orthogonal Polynomials 157 p . Since eigenvalues are preserved ( x T )=0 is equivalent to x being an eigenvalue of j j N by a similarity transformation, x . J is an eigenvalue of the symmetric tridiagonal matrix j [4] Moreover, Wilf shows that if v is the eigenvector corresponding to the eigenvalue x , j j normalized so that , then =1 v v · 2 w ( ) 4.5.27 = μ v j 0 j, 1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v where ∫ b μ = ( W ) x ) dx ( 4.5.28 0 a . As we shall see in Chapter 11, finding all v is the first component of v and where 1 j, eigenvalues and eigenvectors of a symmetric tridiagonal matrix is a relatively efficient and well-conditioned procedure. We accordingly give a routine, gaucof , for finding the abscissas and weights, given the coefficients a . Remember that if you know the recurrence and b j j relation for orthogonal polynomials that are not normalized to be monic, you can easily convert it to monic form by means of the quantities λ . j #include #include "nrutil.h" void gaucof(int n, float a[], float b[], float amu0, float x[], float w[]) Computes the abscissas and weights for a Gaussian quadrature formula from the Jacobi matrix. On input, b[1..n] and are the coefficients of the recurrence relation for the set of a[1..n] ∫ b monic orthogonal polynomials. The quantity μ ) . The abscissas amu0 W ( x ≡ dx is input as 0 a x[1..n] are returned in descending order, with the corresponding weights in w[1..n] .The arrays b a and and are modified. Execution can be speeded up by modifying tqli to eigsrt compute only the first component of each eigenvector. { void eigsrt(float d[], float **v, int n); void tqli(float d[], float e[], int n, float **z); int i,j; float **z; z=matrix(1,n,1,n); for (i=1;i<=n;i++) { if (i != 1) b[i]=sqrt(b[i]); Set up superdiagonal of Jacobi matrix. for (j=1;j<=n;j++) z[i][j]=(float)(i == j); Set up identity matrix for tqli to compute eigenvectors. } tqli(a,b,n,z); Sort eigenvalues into descending order. eigsrt(a,z,n); for (i=1;i<=n;i++) { x[i]=a[i]; w[i]=amu0*z[1][i]*z[1][i]; Equation (4.5.27). } free_matrix(z,1,n,1,n); } g of machine- isit website ica). Orthogonal Polynomials with Nonclassical Weights This somewhat specialized subsection will tell you what to do if your weight function a is not one of the classical ones dealt with above and you do not know the ’s and b ’s j j gaucof of the recurrence relation (4.5.6) to use in a . Then, a method of finding the ’s j and b ’s is needed. j The procedure of Stieltjes is to compute a p ( x ) from (4.5.6). from (4.5.7), then 0 1 Knowing p from (4.5.7), and so on. But how are we and p b , we can compute a and 1 1 1 0 to compute the inner products in (4.5.7)?

182 158 Integration of Functions Chapter 4. and The textbook approach is to represent each ( x ) explicitly as a polynomial in x p j to compute the inner products by multiplying out term by term. This will be feasible if we know the first 2 N moments of the weight function, ∫ b j μ ) 4.5.29 x = W ( x ) dxj =0 , 1 ,..., 2 N − 1( j a readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) a However, the solution of the resulting set of algebraic equations for the coefficients and b j j μ extremely ill-conditioned. Even in double precision, in terms of the moments is in general j =12 it is not unusual to lose all accuracy by the time N . We thus reject any procedure based on the moments (4.5.29). [5] Sack and Donovan discovered that the numerical stability is greatly improved if, instead of using powers of as a set of basis functions to represent the p x ’s, one uses some j π other known set of orthogonal polynomials ( ) , say. Roughly speaking, the improved x j ( ) better than the stability occurs because the polynomial basis “samples” the interval a,b power basis when the inner product integrals are evaluated, especially if its weight function W ( resembles ) . x So assume that we know the modified moments ∫ b 1( − = ) x N π 2 ( 4.5.30 ) W ( x ) dxj =0 , 1 ,..., ν j j a π where the ’s satisfy a recurrence relation analogous to (4.5.6), j π ( x ) ≡ 0 1 − ( x ) ≡ 1 π 0 ( ) 4.5.31 π π x − α ) π ( x ) − β )=( x ( x ) j =0 , 1 , 2 ,... ( j +1 j j 1 − j j [6] are known explicitly. Then Wheeler , β has given an efficient α and the coefficients j j 2 O N ( a ) algorithm equivalent to that of Sack and Donovan for finding via a set and b j j of intermediate quantities σ 1( 4.5.32 = 〈 p ) | π 〉 k,l ≥− k k,l l Initialize , =0 l =1 ,..., 2 2 2 N − σ 1 ,l − = − N ν 1 l =0 , 1 ,..., 2 σ l ,l 0 ) 4.5.33 ( ν 1 = α + a 0 0 ν 0 b =0 0 , compute k =1 , 2 ,...,N − 1 Then, for σ β = σ − ( a − α ) σ − b σ + σ 1 − 1 l ,l +1 k − 1 ,l 1 k,l k − 1 k k − 2 ,l k − l 1 k − − ,l N l k,k +1 ,..., 2 = − k − 1 σ σ k,k − ,k 1 k +1 − α + = a k k σ σ k,k 1 − k − 1 ,k g of machine- σ isit website k,k = b k σ ica). k − − 1 1 ,k ) 4.5.34 ( Note that the normalization factors can also easily be computed if needed: | ν p 〉 = p 〈 0 0 0 4.5.35 ) ( | p 〈 〉 p b = 〈 p ,... 2 , | p =1 j 〉 j 1 j 1 − j j − j [7] You can find a derivation of the above algorithm in Ref. . Wheeler’s algorithm requires that the modified moments (4.5.30) be accurately computed. In practical cases there is often a closed form, or else recurrence relations can be used. The

183 4.5 Gaussian Quadratures and Orthogonal Polynomials 159 . For infinite intervals, the algorithm ) algorithm is extremely successful for finite intervals ( a,b [8,9] does not completely remove the ill-conditioning. In this case, Gautschi recommends reducing the interval to a finite interval by a change of variable, and then using a suitable discretization procedure to compute the inner products. You will have to consult the references for details. a for generating the coefficients We give the routine orthog b by Wheeler’s and j j algorithm, given the coefficients α and ν . For consistency β , and the modified moments j j j http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin b with , the vectors α , β , a and gaucof are 1-based. Correspondingly, we increase the indices of the σ matrix by 2, i.e., sig[k,l] = σ . 2 k − 2 ,l − #include "nrutil.h" void orthog(int n, float anu[], float alpha[], float beta[], float a[], float b[]) Computes the coefficients a and b 1 ,j =0 ,...N , of the recurrence relation for monic − j j orthogonal polynomialswithweightfunction W ( x ) byWheeler’salgorithm. Oninput, thearrays alpha[1..2*n-1] and beta[1..2*n-1] arethe coefficients α ,of and β 2 ,j =0 ,... 2 N − j j the recurrence relation for the chosen basis of orthogonal polynomials. The modified moments ν b[1..n] are input in anu[1..2*n] . n coefficients are returned in a[1..n] and .Thefirst j { int k,l; float **sig; int looptmp; sig=matrix(1,2*n+1,1,2*n+1); looptmp=2*n; for (l=3;l<=looptmp;l++) sig[1][l]=0.0; Initialization, Equation (4.5.33). looptmp++; for (l=2;l<=looptmp;l++) sig[2][l]=anu[l-1]; a[1]=alpha[1]+anu[2]/anu[1]; b[1]=0.0; for (k=3;k<=n+1;k++) { Equation (4.5.34). looptmp=2*n-k+3; for (l=k;l<=looptmp;l++) { sig[k][l]=sig[k-1][l+1]+(alpha[l-1]-a[k-2])*sig[k-1][l]- b[k-2]*sig[k-2][l]+beta[l-1]*sig[k-1][l-1]; } a[k-1]=alpha[k-1]+sig[k][k+1]/sig[k][k]-sig[k-1][k]/sig[k-1][k-1]; b[k-1]=sig[k][k]/sig[k-1][k-1]; } free_matrix(sig,1,2*n+1,1,2*n+1); } [7] orthog , consider the problem As an example of the use of of generating orthogonal log polynomials with the weight function W ( x )= − x . A suitable set on the interval (0 , 1) of π ’s is the shifted Legendre polynomials j 2 ( j !) 4.5.36 ( = ) 1) − x (2 π P j j )! j (2 g of machine- isit website The factor in front of P makes the polynomials monic. The coefficients in the recurrence j ica). relation (4.5.31) are 1 ,... j =0 , 1 = α j 2 4.5.37 ( ) 1 β = , =1 j 2 ,... j 2 − j 4(4 − ) while the modified moments are  1 j =0  2 j j ( !) 1) − ( 4.5.38 ( ) = ν j j ≥ 1  j + 1)(2 j j ( )!

184 160 Chapter 4. Integration of Functions A call to orthog with this input allows one to generate the required polynomials to machine , and hence do Gaussian quadrature with this weight function. Before accuracy for very large N Sack and Donovan’s observation, this seemingly simple problem was essentially intractable. Extensions of Gaussian Quadrature Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin There are many different ways in which the ideas of Gaussian quadrature have : Some been extended. One important extension is the case of preassigned nodes points are required to be included in the set of abscissas, and the problem is to choose the weights and the remaining abscissas to maximize the degree of exactness of the the quadrature rule. The most common cases are quadrature, where Gauss-Radau or b , and Gauss-Lobatto one of the nodes is an endpoint of the interval, either a [10] has given an algorithm similar b and a quadrature, where both are nodes. Golub gaucof for these cases. to Gauss-Kronrod The second important extension is the formulas. For ordinary increases the sets of abscissas have no points Gaussian quadrature formulas, as N in common. This means that if you compare results with increasing as a way of N estimating the quadrature error, you cannot reuse the previous function evaluations. [11] posed the problem of searching for optimal sequences of rules, each Kronrod of which reuses all abscissas of its predecessor. If one starts with = m , say, N free parameters: the n 2 and then adds + m new points, one has n new abscissas n and weights, and m new weights for the fixed previous abscissas. The maximum 1 . − degree of exactness one would expect to achieve would therefore be 2 n + m The question is whether this maximum degree of exactness can actually be achieved ( a,b ) . The answer to in practice, when the abscissas are required to all lie inside this question is not known in general. Kronrod showed that if you choose n = m +1 , an optimal extension can [12] showed how to compute be found for Gauss-Legendre quadrature. Patterson 43 continued extensions of this kind. Sequences such as N =10 , 21 , , 87 ,... are [13] that attempt to integrate a function until popular in automatic quadrature routines some specified accuracy has been achieved. CITED REFERENCES AND FURTHER READING: , Applied Mathe- Handbook of Mathematical Functions Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by § 25.4. [1] Dover Publications, New York), Gaussian Quadrature Formulas Stroud, A.H., and Secrest, D. 1966, (Englewood Cliffs, NJ: Prentice-Hall). [2] , vol. 23, pp. 221–230 and Golub, G.H., and Welsch, J.H. 1969, Mathematics of Computation g of machine- isit website A1–A10. [3] ica). Wilf, H.S. 1962, (New York: Wiley), Problem 9, p. 80. [4] Mathematics for the Physical Sciences Sack, R.A., and Donovan, A.F. 1971/72, Numerische Mathematik , vol. 18, pp. 465–478. [5] Wheeler, J.C. 1974, Rocky Mountain Journal of Mathematics , vol. 4, pp. 287–296. [6] Gautschi, W. 1978, in Recent Advances in Numerical Analysis , C. de Boor and G.H. Golub, eds. (New York: Academic Press), pp. 45–72. [7] Gautschi, W. 1981, in E.B. Christoffel , P.L. Butzer and F. Feh ́ er, eds. (Basel: Birkhauser Verlag), pp. 72–147. [8] Gautschi, W. 1990, in Orthogonal Polynomials , P. Nevai, ed. (Dordrecht: Kluwer Academic Pub- lishers), pp. 181–216. [9]

185 4.6 Multidimensional Integrals 161 Golub, G.H. 1973, , vol. 15, pp. 318–334. [10] SIAM Review Kronrod, A.S. 1964, Doklady Akademii Nauk SSSR , vol. 154, pp. 283–286 (in Russian). [11] Patterson, T.N.L. 1968, Mathematics of Computation , vol. 22, pp. 847–856 and C1–C11; 1969, op. cit. , vol. 23, p. 892. [12] Piessens, R., de Doncker, E., Uberhuber, C.W., and Kahaner, D.K. 1983, QUADPACK: A Sub- (New York: Springer-Verlag). [13] routine Package for Automatic Integration Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Introduction to Numerical Analysis Stoer, J., and Bulirsch, R. 1980, (New York: Springer-Verlag), 3.6. § Johnson, L.W., and Riess, R.D. 1982, , 2nd ed. (Reading, MA: Addison- Numerical Analysis 6.5. § Wesley), Applied Numerical Methods (New York: Carnahan, B., Luther, H.A., and Wilkes, J.O. 1969, §§ 2.9–2.10. Wiley), Ralston, A., and Rabinowitz, P. 1978, , 2nd ed. (New York: A First Course in Numerical Analysis §§ 4.4–4.8. McGraw-Hill), 4.6 Multidimensional Integrals Integrals of functions of several variables, over regions with dimension greater not easy . There are two reasons for this. First, the number of function than one, are -dimensional space increases as the N th power evaluations needed to sample an N of the number needed to do a one-dimensional integral. If you need 30 function evaluations to do a one-dimensional integral crudely, then you will likely need on the order of 30000 evaluations to reach the same crude level for a three-dimensional N integral. Second, the region of integration in -dimensional space is defined by N − 1 an dimensional boundary which can itself be terribly complicated: It need not be convex or simply connected, for example. By contrast, the boundary of a one-dimensional integral consists of two numbers, its upper and lower limits. The first question to be asked, when faced with a multidimensional integral, is, “can it be reduced analytically to a lower dimensionality?” For example, ) can be reduced to t so-called iterated integrals of a function of one variable f ( one-dimensional integrals by the formula ∫ ∫ ∫ ∫ x t t t 3 n 2 ··· dt ) t dt dt f dt ( 1 2 1 n − 1 n 0 0 0 0 4.6.1 ( ) ∫ x 1 n − 1 = ( ( dt − t ) ) t x f 1)! − n ( 0 g of machine- isit website Alternatively, the function may have some special symmetry in the way it depends ica). on its independent variables. If the boundary also has this symmetry, then the dimension can be reduced. In three dimensions, for example, the integration of a spherically symmetric function over a spherical region reduces, in polar coordinates, to a one-dimensional integral. The next questions to be asked will guide your choice between two entirely different approaches to doing the problem. The questions are: Is the shape of the boundary of the region of integration simple or complicated? Inside the region, is the integrand smooth and simple, or complicated, or locally strongly peaked? Does

186 162 Integration of Functions Chapter 4. the problem require high accuracy, or does it require an answer accurate only to a percent, or a few percent? If your answers are that the boundary is complicated, the integrand is not strongly peaked in very small regions, and relatively low accuracy is tolerable, then . This method is very your problem is a good candidate for Monte Carlo integration straightforward to program, in its cruder forms. One needs only to know a region Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v includes the complicated region of integration, plus a with simple boundaries that method of determining whether a random point is inside or outside the region of integration. Monte Carlo integration evaluates the function at a random sample of points, and estimates its integral based on that random sample. We will discuss it in more detail, and with more sophistication, in Chapter 7. If the boundary is simple, and the function is very smooth, then the remaining approaches, breaking up the problem into repeated one-dimensional integrals, or [1] .If multidimensional Gaussian quadratures, will be effective and relatively fast you require high accuracy, these approaches are in any case the only ones available to you, since Monte Carlo methods are by nature asymptotically slow to converge. For low accuracy, use repeated one-dimensional integration or multidimensional Gaussian quadratures when the integrand is slowly varying and smooth in the region of integration, Monte Carlo when the integrand is oscillatory or discontinuous, but not strongly peaked in small regions. If the integrand is strongly peaked in small regions, and you know where those regions are, break the integral up into several regions so that the integrand is smooth in each, and do each separately. If you don’t know where the strongly peaked regions are, you might as well (at the level of sophistication of this book) quit: It is hopeless to expect an integration routine to search out unknown pockets of large contribution in a huge N -dimensional space. (But see § 7.8.) If, on the basis of the above guidelines, you decide to pursue the repeated one- dimensional integration approach, here is how it works. For definiteness, we will x, y, z consider the case of a three-dimensional integral in -space. Two dimensions, or more than three dimensions, are entirely analogous. The first step is to specify the region of integration by (i) its lower and upper and x at ; (ii) its lower and upper limits in y , which we will denote x limits in x 2 1 a specified value of x , denoted y ) x ) and y ( x ( ; and (iii) its lower and upper limits 1 2 in z at specified x and y , denoted z . In other words, find the x, y ) ( ( x, y ) and z 2 1 ( such that ) x, y ( and x z , and the functions y , and ( x ) ,y ) ( x ) ,z x, y x numbers 1 2 2 1 2 1 ∫∫∫ I ) ≡ dx dy dzf ( x, y, z g of machine- isit website ∫ ∫ ∫ ( 4.6.2 ) x,y ( x y z ( x ) ) ica). 2 2 2 = dx ) dz f ( x, y, z dy z y ( ) x ( ) x x,y 1 1 1 For example, a two-dimensional integral over a circle of radius one centered on the origin becomes √ ∫ ∫ 2 1 − x 1 dx ) 4.6.3 )( x, y ( dy f √ 2 1 − x − 1 −

187 4.6 Multidimensional Integrals 163 inner integration Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. y outer integration x Figure 4.6.1. Function evaluations for a two-dimensional integral over an irregular region, shown y , requests values of the inner, x , integral at locations schematically. The outer integration routine, in y axis of its own choosing. The inner integration routine then evaluates the function at along the x locations suitable to it . This is more accurate in general than, e.g., evaluating the function on a Cartesian mesh of points. Now we can de fi ne a function G ( x, y ) that does the innermost integral, ∫ z ( x,y ) 2 ) ≡ G x, y ( 4.6.4 ) ( dz ) f ( x, y, z z ) x,y ( 1 ) , x, y and a function H ( x ) that does the integral of G ( ∫ y x ( ) 2 x ≡ ) ( H dy ) 4.6.5 G ( x, y ) ( y ( x ) 1 ) and fi nally our answer as an integral over H ( x ∫ x 2 H ( ) 4.6.6 dx ) x ( I = g of machine- isit website x 1 ica). In an implementation of equations (4.6.4) (4.6.6), some basic one-dimensional – integration routine (e.g., qgaus in the program following) gets called recursively: once to evaluate the outer integral I , then many times to evaluate the middle integral H , then even more times to evaluate the inner integral G (see Figure 4.6.1). Current func values of and y , and the pointer to your function x , are passed “ over the head ” of the intermediate calls through static top-level variables.

188 164 Integration of Functions Chapter 4. static float xsav,ysav; static float (*nrfunc)(float,float,float); float quad3d(float (*func)(float, float, float), float x1, float x2) Returns the integral of a user-supplied function func over a three-dimensional region specified , z2 ,and ,asdefinedin z1 , yy2 yy1 x1 , x2 , and by the user-supplied functions by the limits (4.6.2). (The functions y yy2 yy1 areherecalled y and and to avoid conflict with the names 1 2 of Bessel functions in some C libraries). Integration is performed by calling recursively. qgaus readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) { float qgaus(float (*func)(float), float a, float b); float f1(float x); nrfunc=func; return qgaus(f1,x1,x2); } float f1(float x) This is H of eq. (4.6.5). { float qgaus(float (*func)(float), float a, float b); float f2(float y); float yy1(float),yy2(float); xsav=x; return qgaus(f2,yy1(x),yy2(x)); } float f2(float y) This is G of eq. (4.6.4). { float qgaus(float (*func)(float), float a, float b); float f3(float z); float z1(float,float),z2(float,float); ysav=y; return qgaus(f3,z1(xsav,y),z2(xsav,y)); } ( float f3(float z) The integrand f . x,y,z ) evaluated at fixed x and y { return (*nrfunc)(xsav,ysav,z); } The necessary user-supplied functions have the following prototypes: float func(float x,float y,float z); The 3-dimensional function to be inte- grated. float yy1(float x); float yy2(float x); float z1(float x,float y); float z2(float x,float y); g of machine- isit website CITED REFERENCES AND FURTHER READING: ica). Stroud, A.H. 1971, (Englewood Cliffs, NJ: Prentice- Approximate Calculation of Multiple Integrals Hall). [1] Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), § 7.7, p. 318. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis , 2nd ed. (Reading, MA: Addison- § 6.2.5, p. 307. Wesley), Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), equations 25.4.58ff.

189 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Chapter 5. Evaluation of Functions 5.0 Introduction The purpose of this chapter is to acquaint you with a selection of the techniques that are frequently used in evaluating functions. In Chapter 6, we will apply and illustrate these techniques by giving routines for a variety of specific functions. The purposes of this chapter and the next are thus mostly in harmony, but there is nevertheless some tension between them: Routines that are clearest and most illustrative of the general techniques of this chapter are not always the methods of choice for a particular special function. By comparing this chapter to the next one, you should get some idea of the balance between “general” and “special” methods that occurs in practice. Insofar as that balance favors general methods, this chapter should give you ideas about how to write your own routine for the evaluation of a function which, while “special” to you, is not so special as to be included in Chapter 6 or the standard program libraries. CITED REFERENCES AND FURTHER READING: (Englewood Cliffs, NJ: Prentice- Fike, C.T. 1968, Computer Evaluation of Mathematical Functions Hall). Applied Analysis Lanczos, C. 1956, ; reprinted 1988 (New York: Dover), Chapter 7. 5.1 Series and Their Convergence Everybody knows that an analytic function can be expanded in the neighborhood g of machine- isit website in a power series, of a point x 0 ica). ∞ ∑ k ) a 5.1.1 ) x − x ( ( f ( x )= 0 k =0 k k th Such series are straightforward to evaluate. You don’t, of course, evaluate the st power and update ab initio for each term; rather you keep the k − 1 − x power of x 0 a is often such as to make it with a multiply. Similarly, the form of the coefficients use of previous work: Terms like k ! or (2 k )! can be updated in a multiply or two. 165

190 166 Evaluation of Functions Chapter 5. How do you know when you have summed enough terms? In practice, the terms had better be getting small fast, otherwise the series is not a good technique to use in the first place. While not mathematically rigorous in all cases, standard practice is to quit when the term you have just added is smaller in magnitude than times the magnitude of the sum thus far accumulated. (But watch out  some small =0 are possible!). a if isolated instances of http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. k to A weakness of a power series representation is that it is guaranteed not at which a singularity is encountered converge farther than that distance from x 0 in the complex plane . This catastrophe is not usually unexpected: When you find a power series in a book (or when you work one out yourself), you will generally also know the radius of convergence. An insidious problem occurs with series that converge everywhere (in the mathematical sense), but almost nowhere fast enough to be useful in a numerical method. Two familiar examples are the sine function and the Bessel function of the first kind, ∞ k ∑ − 1) ( 2 k +1 x ) ( 5.1.2 sin x = k +1)! (2 =0 k ∞ 1 ) ( 2 k ∑ n x ) ( − x 4 J 5.1.3 ( ) )= x ( n k !( k + n )! 2 =0 k Both of these series converge for all x . But both don’t even start to converge | ; before this, their terms are increasing. This makes these series x until k | useless for large . x Accelerating the Convergence of Series There are several tricks for accelerating the rate of convergence of a series (or, generally help in equivalently, of a sequence of partial sums). These tricks will not cases like (5.1.2) or (5.1.3) while the size of the terms is still increasing. For series with terms of decreasing magnitude, however, some accelerating methods can be 2 -process is simply a formula for extrapolating the partial δ startlingly good. Aitken’s sums of a series whose convergence is approximately geometric. If S ,S ,S 1 +1 n n n − are three successive partial sums, then an improved estimate is 2 ) S − S ( n +1 n ′ ) ( − ≡ 5.1.4 S S +1 n n S − 2 S + S +1 n n − 1 n g of machine- isit website You can also use (5.1.4) with n +1 and n − p replaced by n + p and n − 1 ica). ′ respectively, for any integer . If you form the sequence of S p ’s, you can apply i (5.1.4) a second time to that sequence, and so on. (In practice, this iteration will only rarely do much for you after the first stage.) Note that equation (5.1.4) should be computed as written; there exist algebraically equivalent forms that are much more susceptible to roundoff error. For alternating series (where the terms in the sum alternate in sign), Euler’s transformation can be a powerful tool. Generally it is advisable to do a small

191 5.1 Series and Their Convergence 167 say, then apply the transformation to − 1 number of terms directly, through term n the rest of the series beginning with term n . The formula (for n even) is ∞ ∞ s ∑ ∑ ( 1) − s s − 1) [∆ u ) = u 5.1.5 − u ]( + ( u ... − u + u − 1 n 2 n 1 0 s s +1 2 =0 =0 s s Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin ∆ Here , i.e., is the forward difference operator ∆ u ≡ u − u n n n +1 2 − u ∆ ≡ u + u u 2 +2 n +1 n n n ) 5.1.6 ( 3 etc. ≡ u ∆ − u u +3 u − u 3 n n n +1 +3 n n +2 Of course you don’t actually do the infinite sum on the right-hand side of (5.1.5), p differences (5.1.6) obtained but only the first, say, terms, thus requiring the first p from the terms starting at u . n Euler’s transformation can be applied not only to convergent series. In some cases it will produce accurate answers from the first terms of a series that is formally divergent. It is widely used in the summation of asymptotic series. In this case it is generally wise not to sum farther than where the terms start increasing in magnitude; and you should devise some independent numerical check that the results are meaningful. There is an elegant and subtle implementation of Euler’s transformation due [1] : It incorporates the terms of the original alternating series to van Wijngaarden by 1, equivalent either increases p one at a time, in order. For each incorporation it to computing one further difference (5.1.6), or else retroactively increases n by 1, value! The without having to redo all the difference calculations based on the old n decision as to which to increase, n or p , is taken in such a way as to make the convergence most rapid. Van Wijngaarden’s technique requires only one vector of saved partial differences. Here is the algorithm: #include void eulsum(float *sum, float term, int jterm, float wksp[]) Incorporates into sum sum the jterm ’th term, with value term , of an alternating series. is input as the previous partial sum, and is output as the new partial sum. The first call to this term term in the series, should be with jterm=1 . On the second call, routine, with the first should be set to the second term of the series, with sign opposite to that of the first call, and jterm should be 2. And so on. wksp is a workspace array provided by the calling program, dimensioned at least as large as the maximum number of terms to be incorporated. g of machine- { isit website int j; ica). static int nterm; float tmp,dum; if (jterm == 1) { Initialize: nterm=1; Number of saved differences in wksp . *sum=0.5*(wksp[1]=term); Return first estimate. } else { tmp=wksp[1]; wksp[1]=term; for (j=1;j<=nterm-1;j++) { Update saved quantities by van Wijn- gaarden’s algorithm. dum=wksp[j+1];

192 168 Chapter 5. Evaluation of Functions wksp[j+1]=0.5*(wksp[j]+tmp); tmp=dum; } wksp[nterm+1]=0.5*(wksp[nterm]+tmp); , if (fabs(wksp[nterm+1]) <= fabs(wksp[nterm])) Favorable to increase p *sum += (0.5*wksp[++nterm]); and the table becomes longer. , else Favorable to increase n *sum += wksp[nterm+1]; the table doesn’t become longer. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) } } The powerful Euler technique is not directly applicable to a series of positive terms. Occasionally it is useful to convert a series of positive terms into an alternating series, just so that the Euler transformation can be used! Van Wijngaarden has given [1] : a transformation for accomplishing this ∞ ∞ ∑ ∑ 1 − r ) = 5.1.7 ( w v ( − 1) r r =1 =1 r r where ( ) v 5.1.8 +2 v ≡ ··· +4 v + v +8 w r 8 r r r 2 r 4 Equations (5.1.7) and (5.1.8) replace a simple sum by a two-dimensional sum, each term in (5.1.7) being itself an infinite sum (5.1.8). This may seem a strange way to save on work! Since, however, the indices in (5.1.8) increase tremendously rapidly, as powers of 2, it often requires only a few terms to converge (5.1.8) to extraordinary ’s efficiently for v accuracy. You do, however, need to be able to compute the r r “random” values . The standard “updating” tricks for sequential r ’s, mentioned above following equation (5.1.1), can’t be used. Actually, Euler’s transformation is a special case of a more general transforma- g tion of power series. Suppose that some known function ( z ) has the series ∞ ∑ n z ( ) b 5.1.9 )= z ( g n =0 n and that you want to sum the new, unknown, series ∞ ∑ n ( z )= f z c ( b ) 5.1.10 n n =0 n [2] Then it is not hard to show (see ) that equation (5.1.10) can be written as g of machine- ∞ ( n ) isit website ∑ g n ( n ) ( f )= z z c [∆ ) 5.1.11 ( ] ica). 0 n ! n =0 ) n ( is the n th finite-difference c ∆ which often converges much more rapidly. Here 0 n (0) ) ( ∆ operator (equation 5.1.6), with c ) , and g c . ≡ is the n th derivative of g ( z 0 0 The usual Euler transformation (equation 5.1.5 with n =0 ) can be obtained, for example, by substituting 1 3 2 + =1 z − z )= z ( g z − 5.1.12 ) + ··· ( 1+ z

193 5.2 Evaluation of Continued Fractions 169 z =1 . into equation (5.1.11), and then setting Sometimes you will want to compute a function from a series representation even when the computation is not efficient. For example, you may be using the values obtained to fit the function to an approximating form that you will use subsequently 5.8). If you are summing very large numbers of slowly convergent terms, pay (cf. § attention to roundoff errors! In floating-point representation it is more accurate to Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin sum a list of numbers in the order starting with the smallest one, rather than starting with the largest one. It is even better to group terms pairwise, then in pairs of pairs, etc., so that all additions involve operands of comparable magnitude. CITED REFERENCES AND FURTHER READING: Goodwin, E.T. (ed.) 1961, Modern Computing Methods , 2nd ed. (New York: Philosophical Li- brary), Chapter 13 [van Wijngaarden’s transformations]. [1] Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), Chapter 3. Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by § 3.6. Dover Publications, New York), Mathews, J., and Walker, R.L. 1970, , 2nd ed. (Reading, MA: Mathematical Methods of Physics 2.3. [2] § W.A. Benjamin/Addison-Wesley), 5.2 Evaluation of Continued Fractions Continued fractions are often powerful ways of evaluating functions that occur in scientific applications. A continued fraction looks like this: a 1 x )= b ( f + ) 5.2.1 ( 0 a 2 b + 1 a 3 + b 2 a 4 b + 3 a 5 b + 4 b ··· + 5 Printers prefer to write this as a a a a a 2 1 5 4 3 f )= ( x b ) 5.2.2 ( ··· + 0 b + b b b + + + b + 5 1 4 2 3 , usually x In either (5.2.1) or (5.2.2), the ’s can themselves be functions of a ’s and b g of machine- 2 isit website x or times x linear or quadratic monomials at worst (i.e., constants times ). For ica). example, the continued fraction representation of the tangent function is 2 2 2 x x x x = x tan ··· ( 5.2.3 ) − − 7 − 3 1 5 − Continued fractions frequently converge much more rapidly than power series expansions, and in a much larger domain in the complex plane (not necessarily including the domain of convergence of the series, however). Sometimes the continued fraction converges best where the series does worst, although this is not

194 170 Chapter 5. Evaluation of Functions [1] a general rule. Blanch gives a good review of the most useful convergence tests for continued fractions. quotient-difference algo- There are standard techniques, including the important , for going back and forth between continued fraction approximations, power rithm [2] for series approximations, and rational function approximations. Consult Acton [3] an introduction to this subject, and Fike for further details and references. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) How do you tell how far to go when evaluating a continued fraction? Unlike a series, you can’t just evaluate equation (5.2.1) from left to right, stopping when the change is small. Written in the form of (5.2.1), the only way to evaluate the continued fraction is from right to left, first (blindly!) guessing how far out to start. This is not the right way. The right way is to use a result that relates continued fractions to rational approximations, and that gives a means of evaluating (5.2.1) or (5.2.2) from left denote the result of evaluating (5.2.2) with coefficients through to right. Let f n and b . Then a n n A n = 5.2.4 ( ) f n B n A where and B are given by the following recurrence: n n A ≡ 0 ≡ 1 B 1 − − 1 b B ≡ ≡ 1 A 0 0 0 A a b A + B A = ,...,n = b B + a B j =1 , 2 1 j j j j − 1 j 2 j j j − 2 − j − j ( 5.2.5 ) Arithmetica This method was invented by J. Wallis in 1655 (!), and is discussed in his [4] Infinitorum . You can easily prove it by induction. In practice, this algorithm has some unattractive features: The recurrence (5.2.5) frequently generates very large or very small values for the partial numerators and B and . There is thus the danger of overflow or underflow of the denominators A j j floating-point representation. However, the recurrence (5.2.5) is linear in the ’s and A B ’s. At any point you can rescale the currently saved two levels of the recurrence, f ,B ,A , and B = all by B . This incidentally makes A e.g., divide A j 1 − 1 j − j j j j j and is convenient for testing whether you have gone far enough: See if and f f j j − 1 from the last iteration are as close as you would like them to be. (If B happens to j be zero, which can happen, just skip the renormalization for this cycle. A fancier level of optimization is to renormalize only when an overflow is imminent, saving g of machine- the unnecessary divides. All this complicates the program logic.) isit website Two newer algorithms have been proposed for evaluating continued fractions. ica). . /B B = and B D explicitly, but only the ratio does not use Steed’s method A j 1 j j j j − D One calculates ∆ f = f − f recursively using and j j − 1 j j + =1 ( b 5.2.6 / a D )( ) D j − 1 j j j ∆ f ) =( b 5.2.7 D ( − 1)∆ f j − 1 j j j [5] ) avoids the need for rescaling of intermediate results. Steed’s method (see, e.g., However, for certain continued fractions you can occasionally run into a situation

195 5.2 Evaluation of Continued Fractions 171 are very where the denominator in (5.2.6) approaches zero, so that D and ∆ f j j f ∆ large. The next will typically cancel this large change, but with loss of +1 j accuracy in the numerical running sum of the f ’s. It is awkward to program around j this, so Steed’s method can be recommended only for cases where you know in advance that no denominator can vanish. We will use it for a special purpose in the routine bessik ( § 6.7). Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) The best general method for evaluating continued fractions seems to be the [6] . The need for rescaling intermediate results is avoided modified Lentz’s method both the ratios by using /B /A ,D = B A = ( 5.2.8 ) C j j − 1 − 1 j j j j by f and calculating j f = ) C D ( 5.2.9 f j j j − 1 j From equation (5.2.5), one easily shows that the ratios satisfy the recurrence relations ) / ( b + a D ) =1 = b + a /C 5.2.10 ( ,C D j j j − j 1 j − 1 j j j In this algorithm there is the danger that the denominator in the expression for D , j itself, might approach zero. Either of these conditions invalidates C or the quantity j [5] (5.2.10). However, Thompson and Barnett show how to modify Lentz’s algorithm − 30 to fix this: Just shift the offending term by a small amount, e.g., 10 . If you work through a cycle of the algorithm with this prescription, you will see that f j +1 is accurately calculated. In detail, the modified Lentz’s algorithm is this: . = tiny f set =0 b = b ;if f • Set 0 0 0 0 . = f Set • C 0 0 • Set D =0 . 0 • j =1 , 2 ,... For . = b D + a Set D j j j − 1 j If D =0 , set D tiny . = j j Set C . = b /C + a j j j − 1 j = . tiny C set =0 If C j j D Set =1 /D . j j . D C = ∆ Set j j j Set f . = f ∆ − 1 j j j If | ∆ then exit.

196 172 Chapter 5. Evaluation of Functions Manipulating Continued Fractions Several important properties of continued fractions can be used to rewrite them in forms that can speed up numerical computation. An equivalence transformation a λa ) → → 5.2.11 ,b ( → λb λa ,a Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v n +1 n +1 n n n n leaves the value of a continued fraction unchanged. By a suitable choice of the scale ’s. Of course, you factor a ’s and the b λ you can often simplify the form of the can carry out successive equivalence transformations, possibly with different λ ’s, on successive terms of the continued fraction. parts of a continued fraction are continued fractions whose The odd even and f , respectively. Their main use is that they and f successive convergents are n 2 2 n +1 converge twice as fast as the original continued fraction, and so if their terms are not much more complicated than the terms in the original there can be a big savings in computation. The formula for the even part of (5.2.2) is c c 1 2 ( ··· ) 5.2.12 d + = f even 0 d d + + 2 1 where in terms of intermediate variables a 1 α = 1 b 1 ) 5.2.13 ( a n = α 2 ≥ ,n n b b − n n 1 we have d = b α = =1+ ,d ,c α 1 1 0 1 0 2 5.2.14 ) ( c 2 = − α ≥ ,n α + α α =1+ ,d n 2 n 2 2 n − 1 1 − 2 n n 2 n − [1] . Often You can find the similar formula for the odd part in the review by Blanch a combination of the transformations (5.2.14) and (5.2.11) is used to get the best form for numerical work. We will make frequent use of continued fractions in the next chapter. CITED REFERENCES AND FURTHER READING: Handbook of Mathematical Functions Abramowitz, M., and Stegun, I.A. 1964, , Applied Mathe- g of machine- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by isit website § 3.10. Dover Publications, New York), ica). Blanch, G. 1964, , vol. 6, pp. 383–421. [1] SIAM Review Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), Chapter 11. [2] Cuyt, A., and Wuytack, L. 1987, Nonlinear Methods in Numerical Analysis (Amsterdam: North- Holland), Chapter 1. Computer Evaluation of Mathematical Functions (Englewood Cliffs, NJ: Prentice- Fike, C.T. 1968, 8.2, 10.4, and 10.5. [3] §§ Hall), Wallis, J. 1695, in Opera Mathematica , vol. 1, p. 355, Oxoniae e Theatro Shedoniano. Reprinted by Georg Olms Verlag, Hildeshein, New York (1972). [4]

197 5.3 Polynomials and Rational Functions 173 , vol. 64, pp. 490–509. Journal of Computational Physics Thompson, I.J., and Barnett, A.R. 1986, [5] Lentz, W.J. 1976, Applied Optics , vol. 15, pp. 668–671. [6] , P.R. Graves-Morris, ed. (Lon- Jones, W.B. 1973, in Pad ́ e Approximants and Their Applications don: Academic Press), p. 125. [7] readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 5.3 Polynomials and Rational Functions is represented numerically as a stored array of A polynomial of degree N to be the constant coefficients, c[j] with j =0 ,...,N . We will always take c[0] N ; but of course other conventions term in the polynomial, c[ N ] the coefficient of x are possible. There are two kinds of manipulations that you can do with a polynomial: manipulations (such as evaluation), where you are given the numerical numerical value of its argument, or algebraic manipulations, where you want to transform the coefficient array in some way without choosing any particular argument. Let’s start with the numerical. We assume that you know enough never to evaluate a polynomial this way: p=c[0]+c[1]*x+c[2]*x*x+c[3]*x*x*x+c[4]*x*x*x*x; or (even worse!), p=c[0]+c[1]*x+c[2]*pow(x,2.0)+c[3]*pow(x,3.0)+c[4]*pow(x,4.0); Come the (computer) revolution, all persons found guilty of such criminal behavior will be summarily executed, and their programs won’t be! It is a matter of taste, however, whether to write p=c[0]+x*(c[1]+x*(c[2]+x*(c[3]+x*c[4]))); or p=(((c[4]*x+c[3])*x+c[2])*x+c[1])*x+c[0]; is large, one writes If the number of coefficients c[0..n] p=c[n]; for(j=n-1;j>=0;j--) p=p*x+c[j]; g of machine- isit website or ica). p=c[j=n]; while (j>0) p=p*x+c[--j]; Another useful trick is for evaluating a polynomial P ( x ) and its derivative dP ( x ) /dx simultaneously: p=c[n]; dp=0.0; for(j=n-1;j>=0;j--) {dp=dp*x+p; p=p*x+c[j];}

198 174 Chapter 5. Evaluation of Functions or p=c[j=n]; dp=0.0; while (j>0) {dp=dp*x+p; p=p*x+c[--j];} p dp and its derivative as . which yields the polynomial as Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer [1,2] synthetic division The above trick, which is basically , generalizes to the evaluation of the polynomial and nd of its derivatives simultaneously: void ddpoly(float c[], int nc, float x, float pd[], int nd) Given the nc+1 coefficients of a polynomial of degree as an array c[0..nc] with c[0] nc x , and given a value nd>1 , this routine returns the being the constant term, and given a value polynomial evaluated at derivatives as x pd[0] and nd as pd[1..nd] . { int nnd,j,i; float cnst=1.0; pd[0]=c[nc]; for (j=1;j<=nd;j++) pd[j]=0.0; for (i=nc-1;i>=0;i--) { nnd=(nd < (nc-i) ? nd : nc-i); for (j=nnd;j>=1;j--) pd[j]=pd[j]*x+pd[j-1]; pd[0]=pd[0]*x+c[i]; } for (i=2;i<=nd;i++) { After the first derivative, factorial constants come in. cnst *= i; pd[i] *= cnst; } } As a curiosity, you might be interested to know that polynomials of degree n> can be evaluated in fewer than n multiplications, at least if you are willing 3 to precompute some auxiliary coefficients and, in some cases, do an extra addition. For example, the polynomial 2 3 4 )= a x P ( a ) x + a x x ( + a a x + + 5.3.1 3 4 2 1 0 where a 0 , can be evaluated with 3 multiplications and 5 additions as follows: > 4 2 2 x )=[( Ax + B ) P ( + Ax + C ][( Ax ]+ B ) + + D ) E ( 5.3.2 where A,B,C,D, and E are to be precomputed by g of machine- isit website 4 / 1 ica). a =( A ) 4 3 A − a 3 = B 3 A 4 − A 2 a B a 1 2 3 2 ) 5.3.3 ( D +8 B B + =3 2 A a 2 2 = 2 C − 6 B B − D − 2 A 4 2 = E a B − B C + D ) − CD ( − 0

199 5.3 Polynomials and Rational Functions 175 Fifth degree polynomials can be evaluated in 4 multiplies and 5 adds; sixth degree polynomials can be evaluated in 4 multiplies and 7 adds; if any of this strikes [3-5] you as interesting, consult references . The subject has something of the same entertaining, if impractical, flavor as that of fast matrix multiplication, discussed in § 2.11. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Turn now to algebraic manipulations. You multiply a polynomial of degree a by a bit of code n − 1 (array of range [0..n-1] ) by a monomial factor x − like the following, c[n]=c[n-1]; for (j=n-1;j>=1;j--) c[j]=c[j-1]-c[j]*a; c[0] *= (-a); a Likewise, you divide a polynomial of degree n by a monomial factor x − (synthetic division again) using rem=c[n]; c[n]=0.0; for(i=n-1;i>=0;i--) { swap=c[i]; c[i]=rem; rem=swap+rem*a; } . rem which leaves you with a new polynomial array and a numerical remainder Multiplication of two general polynomials involves straightforward summing of the products, each involving one coefficient from each polynomial. Division of two general polynomials, while it can be done awkwardly in the fashion taught using pencil and paper, is susceptible to a good deal of streamlining. Witness the following [3] . routine based on the algorithm in void poldiv(float u[], int n, float v[], int nv, float q[], float r[]) Given the n+1 coefficients of a polynomial of degree n in u[0..n] ,andthe nv+1 coefficients of another polynomial of degree v[0..nv] nv in , divide the polynomial u by the polynomial ,anda v q[0..n] (“ u ”/“ v ” )giving a quotient polynomial whose coefficients are returned in remainder polynomial whose coefficients are returned in r[0..n] . The elements r[nv..n] and q[n-nv+1..n] are returned as zero. { int k,j; g of machine- isit website for (j=0;j<=n;j++) { ica). r[j]=u[j]; q[j]=0.0; } for (k=n-nv;k>=0;k--) { q[k]=r[nv+k]/v[nv]; for (j=nv+k-1;j>=k;j--) r[j] -= q[k]*v[j-k]; } for (j=nv;j<=n;j++) r[j]=0.0; }

200 176 Chapter 5. Evaluation of Functions Rational Functions You evaluate a rational function like μ x p + ··· + x + ( x ) p p P μ μ 0 1 = )= x ( R ) 5.3.4 ( ν Q x ( + x q q + q ) x + ··· 1 ν ν 0 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v in the obvious way, namely as two separate polynomials followed by a divide. As a matter of convention one usually chooses q =1 , obtained by dividing numerator 0 q and denominator by any other . It is often convenient to have both sets of 0 coefficients stored in a single array, and to have a standard function available for doing the evaluation: double ratval(double x, double cof[], int mm, int kk) Given cof[0] + ( mm , kk ,and cof[0..mm+kk] , evaluate and return the rational function mm kk cof[mm+kk]x + cof[mm]x ) ) / (1 + cof[mm+1]x + ··· + + ··· cof[1]x . { int j; double sumd,sumn; Note precision! Change to float if desired. for (sumn=cof[mm],j=mm-1;j>=0;j--) sumn=sumn*x+cof[j]; for (sumd=0.0,j=mm+kk;j>=mm+1;j--) sumd=(sumd+cof[j])*x; return sumn/(1.0+sumd); } CITED REFERENCES AND FURTHER READING: Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), pp. 183, 190. [1] , 2nd ed. (Reading, MA: Mathews, J., and Walker, R.L. 1970, Mathematical Methods of Physics W.A. Benjamin/Addison-Wesley), pp. 361–363. [2] Seminumerical Algorithms , 2nd ed., vol. 2 of The Art of Computer Programming Knuth, D.E. 1981, 4.6. [3] § (Reading, MA: Addison-Wesley), Fike, C.T. 1968, Computer Evaluation of Mathematical Functions (Englewood Cliffs, NJ: Prentice- Hall), Chapter 4. Winograd, S. 1970, Communications on Pure and Applied Mathematics , vol. 23, pp. 165–179. [4] , 2nd ed. (New York: Wiley). [5] Kronsj ̈ o, L. 1987, Algorithms: Their Complexity and Efficiency g of machine- isit website 5.4 Complex Arithmetic ica). As we mentioned in 1.2, the lack of built-in complex arithmetic in C is a § nuisance for numerical work. Even in languages like FORTRAN that have complex data types, it is disconcertingly common to encounter complex operations that produce overflows or underflows when both the complex operands and the complex result are perfectly representable. This occurs, we think, because software companies assign inexperienced programmers to what they believe to be the perfectly trivial task of implementing complex arithmetic.

201 5.4 Complex Arithmetic 177 quite trivial. Addition and subtraction Actually, complex arithmetic is not are done in the obvious way, performing the operation separately on the real and imaginary parts of the operands. Multiplication can also be done in the obvious way, with 4 multiplications, one addition, and one subtraction, id ( a + ib )( c + ) )=( ac − bd )+ i ( bc + ad )( 5.4.1 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v doesn’t count; it just separates the real and imaginary parts (the addition before the i notationally). But it is sometimes faster to multiply via b + )( c + id )=( ac − bd )+ i [( a + ib )( c + d ) − ac − bd ]( 5.4.2 ) ( a d bd which has only three multiplications ( ac , ) ), plus two additions and ( a + b )( c + , three subtractions. The total operations count is higher by two, but multiplication is a slow operation on some machines. While it is true that intermediate results in equations (5.4.1) and (5.4.2) can overflow even when the final result is representable, this happens only when the final answer is on the edge of representability. Not so for the complex modulus, if you are misguided enough to compute it as √ 2 2 (bad!) a b ) 5.4.3 ( + | = ib + a | whose intermediate result will overflow if either a or b is as large as the square 38 19 as compared to 10 ). The right root of the largest representable number (e.g., 10 way to do the calculation is √ { 2 a | | | 1+( b/a ) | a |≥| b √ a | = | ib + 5.4.4 ) ( 2 | b | a/b ) b | a | < | 1+( | Complex division should use a similar trick to prevent avoidable overflows, underflow, or loss of precision,  [ a − b [ i )] )] + d/c ( d/c ( a + b   d | c |≥| |  ib + a ) d/c ( c + d ) 5.4.5 ( = ) ] a [ a ( c/d )+ b ]+ i [ b ( c/d −  + id c   | c | | | < d d c/d ( c )+ , only once. Of course you should calculate repeated subexpressions, like d/c c/d or g of machine- Complex square root is even more complicated, since we must both guard isit website intermediate results, and also enforce a chosen branch cut (here taken to be the ica). negative real axis). To take the square root of id , first compute c +  = d =0 0 c  √  √   2  √ 1+ d/c 1+( )   | |≥| | d c | | c 2 ≡ w ( ) 5.4.6 √   √  2  √ | c/d | + ) 1+( c/d    | c | < | d | d | | 2

202 Evaluation of Functions 178 Chapter 5. Then the answer is  w 0 =0 ( )    d   i + w 0 ≥  c w , =0   2 w  √ ) 5.4.7 ( = id + c | d | readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer  0 ≥ d , 0 + iw w  =0 , c<   2 w     | | d  =0 , c< 0 , d< − iw w  0 w 2 Routines implementing these algorithms are listed in Appendix C. CITED REFERENCES AND FURTHER READING: Midy, P., and Yakovlev, Y. 1991, Mathematics and Computers in Simulation , vol. 33, pp. 33–49. Seminumerical Algorithms , 2nd ed., vol. 2 of The Art of Computer Programming Knuth, D.E. 1981, (Reading, MA: Addison-Wesley) [see solutions to exercises 4.2.1.16 and 4.6.4.41]. 5.5 Recurrence Relations and Clenshaw’s Recurrence Formula Many useful functions satisfy recurrence relations, e.g., (5.5.1) ) x ( nP ( x )=(2 n +1) xP − ( x ) P +1) n ( 1 n +1 n n − 2 n J ) ( J ( x (5.5.2) x )= − ) x ( J n n − 1 n +1 x − x − x (5.5.3) e )= ( ) xE x ( nE n +1 n θ =2cos cos nθ (5.5.4) θ cos( n − 1) θ − cos( n − 2) − (5.5.5) sin nθ =2cos θ sin( n 2) 1) θ − sin( n − θ where the first three functions are Legendre polynomials, Bessel functions of the first [1] kind, and exponential integrals, respectively. (For notation see .) These relations are useful for extending computational methods from two successive values of n to g of machine- isit website other values, either larger or smaller. ica). Equations (5.5.4) and (5.5.5) motivate us to say a few words about trigonometric functions. If your program’s running time is dominated by evaluating trigonometric functions, you are probably doing something wrong. Trig functions whose arguments ,... 2 , , + nδ , are efficiently calculated by n =0 , 1 = θ form a linear sequence θ 0 the following recurrence, cos( θ + δ )=cos θ − [ α cos θ + β sin θ ] ) ( 5.5.6 δ )=sin θ − [ α θ θ − β cos θ ] sin( + sin

203 5.5 Recurrence Relations and Clenshaw’s Recurrence Formula 179 where α and β are the precomputed coefficients ( ) δ 2 2sin ≡ α ) δ 5.5.7 sin ( β ≡ 2 The reason for doing things this way, rather than with the standard (and equivalent) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. α β do not lose significance if the identities for sums of angles, is that here and δ is small. Likewise, the adds in equation (5.5.6) should be done in incremental the order indicated by square brackets. We will use (5.5.6) repeatedly in Chapter 12, when we deal with Fourier transforms. Another trick, occasionally useful, is to note that both sin θ and cos θ can be : calculated via a single call to tan ) ( 2 2 t θ 1 − t sin ( = θ 5.5.8 ) = θ cos tan t ≡ 2 2 t 1+ t 1+ 2 The cost of getting both sin and cos , if you need them, is thus the cost of tan plus 2 multiplies, 2 divides, and 2 adds. On machines with slow trig functions, this can be a savings. However , note that special treatment is required if θ →± π . And also note that many modern machines have trig functions; so you should not very fast assume that equation (5.5.8) is faster without testing. Stability of Recurrences stable You need to be aware that recurrence relations are not necessarily against roundoff error in the direction that you propose to go (either increasing n or decreasing n ). A three-term linear recurrence relation y + a y ,... b y + =0 ,n =1 , 2 ) ( 5.5.9 n 1 n n n n − +1 has two linearly independent solutions, f say. Only one of these corresponds and g n n that you are trying to generate. The other one g to the sequence of functions f n n be exponentially growing in the direction that you want to go, or exponentially may damped, or exponentially neutral (growing or dying as some power law, for example). If it is exponentially growing, then the recurrence relation is of little or no practical use in that direction. This is the case, e.g., for (5.5.2) in the direction of increasing x

204 180 Chapter 5. Evaluation of Functions possible formulas, of course. Given a recurrence relation for some function ( x ) f n x you can test it yourself with about five minutes of (human) labor: For a fixed f in your range of interest, start the recurrence not with true values of ( x and ) j ( x ) , but (first) with the values 1 and 0, respectively, and then (second) with f j +1 0 and 1, respectively. Generate 10 or 20 terms of the recursive sequences in the direction that you want to go (increasing or decreasing from j ), for each of the two readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) starting conditions. Look at the difference between the corresponding members of the two sequences. If the differences stay of order unity (absolute value less than 10, say), then the recurrence is stable. If they increase slowly, then the recurrence may be mildly unstable but quite tolerably so. If they increase catastrophically, then there is an exponentially growing solution of the recurrence. If you know that the function that you want actually corresponds to the growing solution, then you can for ( x ) Y keep the recurrence formula anyway e.g., the case of the Bessel function n , see § 6.5; if you don’t know which solution your function corresponds increasing n to, you must at this point reject the recurrence formula. Notice that you can do this before you go to the trouble of finding a numerical method for computing the test ) ( ( x ) and f : stability is a property of the recurrence, x f two starting functions +1 j j not of the starting values. An alternative heuristic procedure for testing stability is to replace the recur- rence relation by a similar one that is linear with constant coefficients. For example, the relation (5.5.2) becomes − 2 γy =0 + y 5.5.11 ( ) y n 1 n n +1 − γ n/x is treated as a constant. You solve such recurrence relations where ≡ n Substituting into the above recur- = a . by trying solutions of the form y n rence gives √ 2 2 a γa or a = γ ± 2 γ − − +1=0 5.5.12 ) 1( The recurrence is stable if | a |≤ 1 for all solutions a . This holds (as you can verify) x 1 . The recurrence (5.5.2) thus cannot be used, starting with if | γ |≤ J or n ≤ ) x ( 0 ( x ) , to compute x . ( n ) for large J J and 1 n Possibly you would at this point like the security of some real theorems on this subject (although we ourselves always follow one of the heuristic procedures). [2] : Here are two theorems, due to Perron α β Theorem A. If in (5.5.9) a ∼ an , then , b ∼ bn α as n →∞ , and β< 2 n n α − β α g ∼− 5.5.13 ∼− an ( ,f ) n /f ) /g ( b/a n +1 n n n +1 g of machine- isit website ica). and f is the minimal solution to (5.5.9). n β =2 Under the same conditions as Theorem A, but with , Theorem B. α consider the characteristic polynomial 2 + at + b =0 ( 5.5.14 ) t If the roots t | t | t and of (5.5.14) have distinct moduli, > | t | say, then 1 2 2 1 α α g /g ( ∼ t ) n 5.5.15 ,f n t /f ∼ n +1 2 n 1 n +1 n

205 5.5 Recurrence Relations and Clenshaw’s Recurrence Formula 181 and f is again the minimal solution to (5.5.9). Cases other than those in these n two theorems are inconclusive for the existence of minimal solutions. (For more [3] on the stability of recurrences, see .) How do you proceed if the solution that you desire is the minimal solution? The answer lies in that old aphorism, that every cloud has a silver lining: If a recurrence relation is catastrophically unstable in one direction, then that (undesired) solution readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. will decrease very rapidly in the reverse direction. This means that you can start and f and (when you have gone with seed values for the consecutive f any +1 j j enough steps in the stable direction) you will converge to the sequence of functions that you want, times an unknown normalization factor. If there is some other way ’s), then this to normalize the sequence (e.g., by a formula for the sum of the f n can be a practical means of function evaluation. The method is called Miller’s [1,4] uses equation (5.5.2) in just this way, along algorithm . An example often given with the normalization formula 1= J )+2 5.5.16 ( x ) J ( ( x )+2 J ··· ( x )+2 J )+ ( x 6 2 0 4 Incidentally, there is an important relation between three-term recurrence relations and continued fractions . Rewrite the recurrence relation (5.5.9) as b y n n ) 5.5.17 ( = − /y y a y + − n +1 n n 1 n Iterating this equation, starting with n ,gives b b y n n n +1 5.5.18 ) ( ··· = − y a − a − − n n +1 n 1 [2] Pincherle’s Theorem tells us that (5.5.18) converges if and only if (5.5.9) has a , in which case it converges to f /f . This result, usually for minimal solution f n − 1 n n n and combined with some way to determine f =1 the case , underlies many of the 0 practical methods for computing special functions that we give in the next chapter. Clenshaw’s Recurrence Formula [5] Clenshaw’s recurrence formula is an elegant and efficient way to evaluate a sum of coefficients times functions that obey a recurrence formula, e.g., N N ∑ ∑ f ( θ )= x P c c ) cos kθ or f ( x )= ( k k k g of machine- isit website k =0 k =0 ica). Here is how it works: Suppose that the desired sum is N ∑ ( ) c 5.5.19 F )( x )= ( x f k k =0 k and that F obeys the recurrence relation k F ) 5.5.20 ( x )= α ( n,x ) F )( ( x x β ( n,x ) F ( )+ − 1 n n n +1

206 182 Chapter 5. Evaluation of Functions = for some functions n,x ) and β ( n,x ) . Now define the quantities y α ( k ( k N,N 1 ,..., 1) by the following recurrence: − = y =0 y +2 N +1 N ) 5.5.21 ( y α ) k,x y = 1) + β ( k +1 ,x ) y ,..., ( + c 1 ( k = N,N − k k +1 k k +2 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin on the left, and then write out explicitly the c If you solve equation (5.5.21) for k sum (5.5.19), it will look (in part) like this: f x )= ··· ( x ) F ] y − α ( (8 ) y ) − β (9 ,x ,x y +[ 9 10 8 8 β (8 x − α (7 ,x ) y ( − F ] ,x ) y ) +[ y 9 7 8 7 y +[ (7 y ) − α (6 ,x ) x ( − β F ,x ) y ] 8 6 7 6 +[ y ) − x − α (5 ,x ) y ( F β (6 ,x ) y ] 7 5 6 5 ) ( 5.5.22 + ··· ) x ( − α (2 ,x ) y F − β (3 ,x ) y ] y +[ 4 2 3 2 +[ y ) (1 ,x ) y β − α (2 ,x − y ) ] F x ( 1 3 2 1 +[ c + β (1 ,x ) y ) − β (1 ,x ) y x ] F ( 0 2 0 2 in the last line. If you examine β ,x ) y Notice that we have added and subtracted (1 2 y the terms containing a factor of in (5.5.22), you will find that they sum to zero as 8 a consequence of the recurrence relation (5.5.20); similarly all the other y ’s down k . The only surviving terms in (5.5.22) are through y 2 f F ) ( x )= β (1 ,x x ( ( x ) y + F ( 5.5.23 ) y + F ) ( x ) c 0 1 1 2 0 0 for doing the Clenshaw’s recurrence formula Equations (5.5.21) and (5.5.23) are ’s using (5.5.21); when you sum (5.5.19): You make one pass down through the y k have reached y you apply (5.5.23) to get the desired answer. and y 1 2 in a Clenshaw’s recurrence as written above incorporates the coefficients c k c downward order, with k decreasing. At each stage, the effect of all previous ’s k is “remembered” as two coefficients which multiply the functions F F and k +1 k g of machine- isit website if the (ultimately F and and F is large, ). If the functions F k are small when k 1 0 ica). coefficients c are small when k is small , then the sum can be dominated by small k ’s. In this case the remembered coefficients will involve a delicate cancellation F k and there can be a catastrophic loss of significance. An example would be to sum the trivial series ) 5.5.24 (1) = 0 × J (1) ( (1) + 0 × J J (1) + ... +0 × J × (1) + 1 J 14 15 0 15 1 Here J , which is tiny, ends up represented as a canceling linear combination of 15 and J , which are of order unity. J 0 1

207 5.6 Quadratic and Cubic Equations 183 The solution in such cases is to use an alternative Clenshaw recurrence that incorporates c ’s in an upward direction. The relevant equations are k y ) 5.5.25 = y ( =0 1 − 2 − 1 y y [ k,x y , ] ( = c α − − ) k − 2 k k 1 k − ,x +1 ) ( β k Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. 5.5.26 ( ( k =0 , 1 ,...,N − 1) ) f ( x )= c N,x ) F ( x ) − β ( 5.5.27 ) F ( x ) y ( − F ( x ) y − 1 1 N N N N N − N − 2 The rare case where equations (5.5.25)–(5.5.27) should be used instead of equations (5.5.21) and (5.5.23) can be detected automatically by testing whether the operands in the first sum in (5.5.23) are opposite in sign and nearly equal in magnitude. Other than in this special case, Clenshaw’s recurrence is always stable, independent of whether the recurrence for the functions F is stable in the upward k or downward direction. CITED REFERENCES AND FURTHER READING: Handbook of Mathematical Functions , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), pp. xiii, 697. [1] Gautschi, W. 1967, SIAM Review , vol. 9, pp. 24–82. [2] Lakshmikantham, V., and Trigiante, D. 1988, Theory of Difference Equations: Numerical Methods (San Diego: Academic Press). [3] and Applications Acton, F.S. 1970, ; 1990, corrected edition (Washington: Mathe- Numerical Methods That Work matical Association of America), pp. 20ff. [4] Clenshaw, C.W. 1962, Mathematical Tables , vol. 5, National Physical Laboratory (London: H.M. Stationery Office). [5] Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), § 4.4.3, p. 111. , 2nd ed. (New York: Philosophical Li- Modern Computing Methods Goodwin, E.T. (ed.) 1961, brary), p. 76. 5.6 Quadratic and Cubic Equations The roots of simple algebraic equations can be viewed as being functions of the equations’ coefficients. We are taught these functions in elementary algebra. Yet, g of machine- surprisingly many people don’t know the right way to solve a quadratic equation isit website with two real roots, or to obtain the roots of a cubic equation. ica). There are two ways to write the solution of the quadratic equation 2 ax + bx + c ( 5.6.1 ) =0 a,b,c , namely with real coefficients √ 2 b − 4 ac − b ± 5.6.2 ) ( = x a 2

208 184 Chapter 5. Evaluation of Functions and 2 c √ 5.6.3 ( ) = x 2 b − b − 4 ac ± (5.6.2) or either (5.6.3) to get the two roots, you are asking for trouble: If you use (or both) are small, then one of the roots will involve the subtraction If either a or c http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) from a very nearly equal quantity (the discriminant); you will get that root very b of inaccurately. The correct way to compute the roots is [ ] √ 1 2 ≡− q b ) ( sgn + b ( b 5.6.4 − 4 ac ) 2 Then the two roots are q c x ) 5.6.5 and x ( = = 2 1 a q If the coefficients a,b,c , are complex rather than real, then the above formulas still hold, except that in equation (5.6.4) the sign of the square root should be chosen so as to make √ 2 Re ( b * ≥ ) 4 ac ) − 0( 5.6.6 b where Re denotes the real part and asterisk denotes complex conjugation. Apropos of quadratic equations, this seems a convenient place to recall that 1 − 1 − sinh the inverse hyperbolic functions and cosh are in fact just logarithms of solutions to such equations, √ ( ) 1 − 2 + x x +1 x ( 5.6.7 ) )= ln ( sinh √ ( ) 1 − 2 cosh 1 x + x x )= − ± ln ( 5.6.8 ) ( Equation (5.6.7) is numerically robust for x ≥ 0 . For negative x , use the symmetry − 1 − 1 − x )= − sinh . ( ( x ) . Equation (5.6.8) is of course valid only for x ≥ 1 sinh cubic equation For the 3 2 x bx ax + + + 5.6.9 c =0 ( ) g of machine- isit website with real or complex coefficients a,b,c , first compute ica). 3 2 a 2 − b c +27 3 − ab 9 a and ≡ R ) ( 5.6.10 Q ≡ 54 9 2 3 and R are real (always true when a,b,c are real) and R If Q

209 5.6 Quadratic and Cubic Equations 185 in terms of which the three roots are ( ) √ θ a x − 2 Q = cos − 1 3 3 ( ) √ +2 θ π a x − 2 Q = cos ( ) 5.6.12 − 2 3 3 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. ( ) √ π 2 − θ a Q cos − 2 = x − 3 3 3 ` cois Vi (This equation first appears in Chapter VI of Fran ̧ ete’s treatise “De emen- datione,” published in 1615!) Otherwise, compute ] [ √ 3 / 1 2 3 R − Q ( 5.6.13 ) A R = − + where the sign of the square root is chosen to make √ 3 2 ≥ ) ) 5.6.14 R 0( − Q * ( Re R are both real, equations R (asterisk again denoting complex conjugation). If Q and (5.6.13)–(5.6.14) are equivalent to [ ] √ / 1 3 2 3 R ) R | A | + = − sgn ( − Q 5.6.15 ) ( R where the positive square root is assumed. Next compute { Q/A ( A  =0) ( ) 5.6.16 B = A =0) 0( in terms of which the three roots are a x ) 5.6.17 ( + B − =( A ) 1 3 are real) and a,b,c (the single real root when √ a 1 3 x ( A + B ) − ) − + i A B ( = − 2 2 2 3 ( ) 5.6.18 √ a 3 1 x A + B ) ( − ) − i A ( B − − = 3 2 3 2 (in that same case, a complex conjugate pair). Equations (5.6.13)–(5.6.16) are arranged both to minimize roundoff error, and also (as pointed out by A.J. Glassman) to ensure that no choice of branch for the complex cube root can result in the g of machine- isit website spurious loss of a distinct root. ica). If you need to solve many cubic equations with only slightly different coeffi- cients, it is more efficient to use Newton’s method ( § 9.4). CITED REFERENCES AND FURTHER READING: Weast, R.C. (ed.) 1967, Handbook of Tables for Mathematics , 3rd ed. (Cleveland: The Chemical Rubber Co.), pp. 130–133. § 6.1. Pachner, J. 1983, Handbook of Numerical Analysis Applications (New York: McGraw-Hill), McKelvey, J.P. 1984, American Journal of Physics , vol. 52, pp. 269–270; see also vol. 53, p. 775, and vol. 55, pp. 374–375.

210 186 Evaluation of Functions Chapter 5. 5.7 Numerical Derivatives f ( x ) , and now Imagine that you have a procedure which computes a function ′ f you want to compute its derivative ( x ) . Easy, right? The definition of the → derivative, the limit as h 0 of readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) h ) ( x + f ) − f ( x ′ ) 5.7.1 ( ( ≈ x ) f h h ; evaluate f practically suggests the program: Pick a small value x + h ) ; you ( probably have ) already evaluated, but if not, do it too; finally apply equation f ( x (5.7.1). What more needs to be said? Applied uncritically, the above procedure is almost Quite a lot, actually. guaranteed to produce inaccurate results. Applied properly, it can be the right way to compute a derivative only when the function f is fiercely expensive to compute, ( f , and when, therefore, you want x ) when you already have invested in computing to get the derivative in no more than a single additional function evaluation. In such a situation, the remaining issue is to choose h properly, an issue we now discuss: There are two sources of error in equation (5.7.1), truncation error and roundoff error. The truncation error comes from higher terms in the Taylor series expansion, 1 1 3 2 ′′ ′ ′′′ h h 5.7.2 )+ x )+ ( ( x )+ ( x f ) f ··· ( )= f )+ f x ( hf h + x ( 6 2 whence 1 x h ) − f ( + ) f x ( ′ ′′ hf f = + + ··· ( 5.7.3 ) h 2 The roundoff error has various contributions. First there is roundoff error in h : Suppose, by way of an example, that you are at a point x =10 . 3 and you blindly . choose h =0 . 0001 . Neither x =10 . 3 nor x + h =10 30001 is a number with an exact representation in binary; each is therefore represented with some fractional , whose value in single error characteristic of the machine’s floating-point format,  m − 7 precision may be ∼ 10 effective value of h , namely the difference . The error in the , x h + between and x as represented in the machine, is therefore on the order of  x m 2 − which implies a fractional error in h of order ∼  ∼ 10 ! By equation (5.7.1) x/h m this immediately implies at least the same large fractional error in the derivative. h differ by an exactly We arrive at Lesson 1: Always choose x so that x + h and representable number. This can usually be accomplished by the program steps g of machine- isit website ica). temp x + h = ) 5.7.4 ( − temp = h x Some optimizing compilers, and some computers whose floating-point chips have higher internal accuracy than is stored externally, can foil this trick; if so, it is usually enough to declare temp as volatile , or else to call a dummy function donothing(temp) between the two equations (5.7.4). This forces temp into and out of addressable memory.

211 5.7 Numerical Derivatives 187 e ∼ With an “exact” number, the roundoff error in equation (5.7.1) is h r  | f ( x ) /h . Here |  is computed; for a is the fractional accuracy with which f f f simple function this may be comparable to the machine accuracy,  ≈  , but for a m f complicated calculation with additional sources of inaccuracy it may be larger. The ′′ to h . Varying ∼| hf ( x ) | e truncation error in equation (5.7.3) is on the order of t minimize the sum e + e gives the optimal choice of h , http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v r t √  f √ f h ∼  ) x ( 5.7.5 ≈ c f ′′ f ′′ 1 / 2 f/f f ) is the “curvature scale” of the function ≡ , or “characteristic ( where x c scale” over which it changes. In the absence of any other information, one often x = x (except near x =0 where some other estimate of the typical assumes x c scale should be used). With the choice of equation (5.7.5), the fractional accuracy of the computed derivative is √ √ 2 ′′ 2 / 1 ′ ′ f |∼ /  ) ( ff ) /f | e ) + (  ∼ 5.7.6 ( e r f t f ′′ ′ f f Here the last order-of-magnitude equality assumes that , all share , and f the same characteristic length scale, usually the case. One sees that the simple finite-difference equation (5.7.1) gives only the square root of the machine at best . accuracy  m If you can afford two function evaluations for each derivative calculation, then it is significantly better to use the symmetrized form ) h ) x ( f − − h f ( x + ′ ) ( 5.7.7 f ≈ ) ( x h 2 2 ′′′ e In this case, by equation (5.7.2), the truncation error is . The roundoff h ∼ f t , by a short calculation is about the same as before. The optimal choice of h e error r analogous to the one above, is now ( ) 3 / 1  f f 1 / 3  ) ( 5.7.8 ∼ ( x ) h ∼ c f ′′′ f and the fractional error is / 3 / ′ 3 2 1 ′′′ 3 / 2 3 / ′ 2 e ( ( |∼ f | f / ) ) ( f e ) + 5.7.9  /f ) ∼ (  ( ) f t f r g of machine- isit website ica). which will typically be an order of magnitude (single precision) or two orders of than equation (5.7.6). We have arrived at Lesson better magnitude (double precision) . or  x times a characteristic scale h to be the correct power of 2: Choose  m c f [1] You can easily derive the correct powers for other cases . For a function of two dimensions, for example, and the mixed derivative formula 2 ∂ )] h − [ f ( x + h,y + h ) − f ( x + h,y − h )] − [ f ( x − h,y + h ) − f ( x − h,y f = 2 ∂x∂y 4 h ( 5.7.10 )

212 188 Chapter 5. Evaluation of Functions 1 / 4 . ∼  the correct scaling is x h c f It is disappointing, certainly, that no simple finite-difference formula like equation (5.7.1) or (5.7.7) gives an accuracy comparable to the machine accuracy . Are there no better , or even the lower accuracy to which f is evaluated,   f m methods? Yes, there are. All, however, involve exploration of the function’s behavior over http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin , plus some assumption of smoothness, or analyticity, so that x scales comparable to c the high-order terms in a Taylor expansion like equation (5.7.2) have some meaning. Such methods also involve multiple evaluations of the function f , so their increased accuracy must be weighed against increased cost. The general idea of “Richardson’s deferred approach to the limit” is particularly attractive. For numerical integrals, that idea leads to so-called Romberg integration 4.3). For derivatives, one seeks to extrapolate, to h → (for review, see , the result § 0 of finite-difference calculations with smaller and smaller finite values of h . By the § 3.1), one uses each new finite-difference calculation to use of Neville’s algorithm ( produce both an extrapolation of higher order, and also extrapolations of previous, [2] has given a nice implementation lower, orders but with smaller scales . Ridders h dfridr of this idea; the following program, , is based on his algorithm, modified by an improved termination criterion. Input to the routine is a function f (called func ), x h a position (more analogous to what we have called x , and a largest stepsize c ). Output is the returned value of the derivative, h above than to what we have called and an estimate of its error, err . #include #include "nrutil.h" #define CON 1.4 Stepsize is decreased by at each iteration. CON #define CON2 (CON*CON) #define BIG 1.0e30 #define NTAB 10 Sets maximum size of tableau. #define SAFE 2.0 Return when error is SAFE worse than the best so far. float dfridr(float (*func)(float), float x, float h, float *err) Returns the derivative of a function func at a point x by Ridders’ method of polynomial extrapolation. The value h is input as an estimated initial stepsize; it need not be small, but .Anestimateofthe x over which func changes substantially rather should be an increment in error in the derivative is returned as err . { int i,j; float errt,fac,hh,**a,ans; if (h == 0.0) nrerror("h must be nonzero in dfridr."); a=matrix(1,NTAB,1,NTAB); g of machine- hh=h; isit website a[1][1]=((*func)(x+hh)-(*func)(x-hh))/(2.0*hh); ica). *err=BIG; for (i=2;i<=NTAB;i++) { Successive columns in the Neville tableau will go to smaller stepsizes and higher orders of extrapolation. hh /= CON; a[1][i]=((*func)(x+hh)-(*func)(x-hh))/(2.0*hh); Try new, smaller step- size. fac=CON2; for (j=2;j<=i;j++) { Compute extrapolations of various orders, requiring no new function eval- a[j][i]=(a[j-1][i]*fac-a[j-1][i-1])/(fac-1.0); uations. fac=CON2*fac; errt=FMAX(fabs(a[j][i]-a[j-1][i]),fabs(a[j][i]-a[j-1][i-1]));

213 5.7 Numerical Derivatives 189 The error strategy is to compare each new extrapolation to one order lower, both at the present stepsize and the previous one. If error is decreased, save the improved answer. if (errt <= *err) { *err=errt; ans=a[j][i]; } } if (fabs(a[i][i]-a[i-1][i-1]) >= SAFE*(*err)) break; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v SAFE , then quit early. If higher order is worse by a significant factor } free_matrix(a,1,NTAB,1,NTAB); return ans; } In dfridr , the number of evaluations of func is typically 6 to 12, but is allowed to be as great as 2 × . As a function of input h , it is typical for the accuracy NTAB better as h is made larger, until a sudden point is reached where nonsensical to get extrapolation produces early return with a large error. You should therefore choose a fairly large value for h , but monitor the returned value err , decreasing h if it is not small. For functions whose characteristic x scale is of order unity, we typically h take to be a few tenths. Besides Ridders’ method, there are other possible techniques. If your function is fairly smooth, and you know that you will want to evaluate its derivative many times at arbitrary points in some interval, then it makes sense to construct a Chebyshev polynomial approximation to the function in that interval, and to evaluate the derivative directly from the resulting Chebyshev coefficients. This method is 5.8–5.9, following. described in §§ Another technique applies when the function consists of data that is tabulated at equally spaced intervals, and perhaps also noisy. One might then want, at each point, to least-squares fit a polynomial of some degree M , using an additional of points to the right of each n of points to the left and some number n number L R value. The estimated derivative is then the derivative of the resulting desired x fitted polynomial. A very efficient way to do this construction is via Savitzky-Golay § 14.8. There we will give a smoothing filters, which will be discussed later, in routine for getting filter coefficients that not only construct the fitting polynomial but, in the accumulation of a single sum of data points times filter coefficients, evaluate , has an argument it as well. In fact, the routine given, savgol ld that determines which derivative of the fitted polynomial is evaluated. For the first derivative, the appropriate setting is ld=1 , and the value of the derivative is the accumulated sum . divided by the sampling interval h g of machine- isit website CITED REFERENCES AND FURTHER READING: ica). Dennis, J.E., and Schnabel, R.B. 1983, Numerical Methods for Unconstrained Optimization and §§ 5.4–5.6. [1] Nonlinear Equations (Englewood Cliffs, NJ: Prentice-Hall), Ridders, C.J.F. 1982, Advances in Engineering Software , vol. 4, no. 2, pp. 75–76. [2]

214 190 Evaluation of Functions Chapter 5. 5.8 Chebyshev Approximation is denoted ) , and is given by The Chebyshev polynomial of degree n x T ( n the explicit formula Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ) ( x )=cos( n arccos x )( 5.8.1 T n This may look trigonometric at first glance (and there is in fact a close relation between the Chebyshev polynomials and the discrete Fourier transform); however (5.8.1) can be combined with trigonometric identities to yield explicit expressions T for x ) ( (see Figure 5.8.1), n ( x )=1 T 0 ( x )= x T 1 2 − ( x )=2 x 1 T 2 3 T x )=4 x − 3 x ( 3 5.8.2 ( ) 4 2 x − 8 x ( +1 x )=8 T 4 ··· ( x )=2 xT . ( x ) − T 1 ≥ n ( x ) T 1 n n +1 n − ’s — see x in terms of the T (There also exist inverse formulas for the powers of n equations 5.11.2-5.11.3.) over a weight 1] , The Chebyshev polynomials are orthogonal in the interval [ − 1 2 − 1 / 2 ) . In particular, x − (1 { ∫ j  0 = i 1 ( ( x ) T x ) T j i √ i dx = (  5.8.3 =0 j = π/ 2 ) 2 − 1 x 1 − πi j =0 = zeros in the interval , and they are located 1] , 1 ( x ) has n − [ The polynomial T n at the points ( ) 1 ( k − π ) 2 =cos x ) ( 5.8.4 k =1 , 2 ,...,n g of machine- n isit website ica). In this same interval there are +1 extrema (maxima and minima), located at n ) ( πk 5.8.5 ( ) k =0 , 1 ,...,n x =cos n ; 1 − ( x )=1 , while at all of the minima T )= ( x T At all of the maxima n n it is precisely this property that makes the Chebyshev polynomials so useful in polynomial approximation of functions.

215 5.8 Chebyshev Approximation 191 1 T 0 T 1 T 2 .5 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. T 3 0 T 6 Chebyshev polynomials − T .5 5 T 4 − 1 − − − − − .8 .6 1 .8 .6 .4 1 .4 0 .2 .2 x Chebyshev polynomials T ) ( x Figure 5.8.1. through T roots in the interval ( x ) . Note that T j has 6 0 j . 1 1) and that all the polynomials are bounded between ( 1 , − ± The Chebyshev polynomials satisfy a discrete orthogonality relation as well as the continuous one (5.8.3): If x ( k =1 ,...,m ) are the m zeros of T given ( x ) k m i,j

216 192 Chapter 5. Evaluation of Functions ) is x equal to all of the N zeros of T exact ( x for . N For a fi xed N , equation (5.8.8) is a polynomial in which approximates the x function f ( x ) in the interval [ − 1 , 1] (where all the zeros of T ( x ) are located). Why N is this particular approximating polynomial better than any other one, exact on some that (5.8.8) is necessarily more accurate N points? The answer is not other set of than some other approximating polynomial of the same order N (for some speci fi ed http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin fi “ accurate ” ), but rather that (5.8.8) can be truncated to a polynomial of de nition of yield the m N in a very graceful way, one that does degree “ most accurate ” lower  approximation of degree (in a sense that can be made precise). Suppose N is m so large that (5.8.8) is virtually a perfect approximation of ( x ) . Now consider f the truncated approximation ] [ m − 1 ∑ 1 c ) ( ( x ) c − 5.8.9 T ≈ f ( x ) k k 0 2 =0 k with the same c ’ s, computed from (5.8.7). Since the T s are all bounded ( x ) ’ k j ± 1 , the difference between (5.8.9) and (5.8.8) can be no larger than the between 1 = ’ s( k s are rapidly m,...,N − ’ ). In fact, if the c sum of the neglected c k k decreasing (which is the typical case), then the error is dominated by c ) , x ( T m m m +1 an oscillatory function with equal extrema distributed smoothly over the − 1 [ 1] . This smooth spreading out of the error is a very important property: interval , The Chebyshev approximation (5.8.9) is very nearly the same polynomial as that minimax polynomial holy grail of approximating polynomials the , which (among all polynomials of the same degree) has the smallest maximum deviation from the true fi f x ) function ( cult to fi nd; the Chebyshev . The minimax polynomial is very dif approximating polynomial is almost identical and is very easy to compute! So, given some (perhaps dif fi cult) means of computing the function f ( x ) ,we now need algorithms for implementing (5.8.7) and (after inspection of the resulting ’ s and choice of a truncating value m ) evaluating (5.8.9). The latter equation then c k f ( x ) for all subsequent time. becomes an easy way of computing The fi rst of these tasks is straightforward. A generalization of equation (5.8.7) that is here implemented is to allow the range of approximation to be between two − a and b , instead of just arbitrary limits 1 to 1 . This is effected by a change of variable 1 ) a b ( + − x 2 ≡ y ( 5.8.10 ) 1 ( b − ) a 2 f . and by the approximation of y by a Chebyshev polynomial in x ) ( #include g of machine- #include "nrutil.h" isit website #define PI 3.141592653589793 ica). void chebft(float a, float b, float c[], int n, float (*func)(float)) Chebyshev fit: Given a function func , lower and upper limits of the interval [ a , b ], and a ≈ n , this routine computes the n coefficients c[0..n-1] such that func ( x ) maximum degree ∑ n-1 [ are related by (5.8.10). This routine is to be used with x c and T y ( y )] − c ,where / 2 0 k k =0 k moderately large n (e.g., 30 or 50), the array of c ’s subsequently to be truncated at the smaller value m such that c and subsequent elements are negligible. m { int k,j; float fac,bpa,bma,*f;

217 5.8 Chebyshev Approximation 193 f=vector(0,n-1); bma=0.5*(b-a); bpa=0.5*(b+a); for (k=0;k 0.0) nrerror("x not in range in routine chebev"); y2=2.0*(y=(2.0*x-a-b)/(b-a)); Change of variable. for (j=m-1;j>=1;j--) { Clenshaw’s recurrence. sv=d; d=y2*d-dd+c[j]; dd=sv; } return y*d-dd+0.5*c[0]; Last step is different. }

218 194 Evaluation of Functions Chapter 5. − 1 If we are approximating an even function on the interval [ , its expansion 1] , chebev with will involve only even Chebyshev polynomials. It is wasteful to call [1] cients zero fi all the odd coef . Instead, using the half-angle identity for the cosine in equation (5.8.1), we get the relation 2 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. ( ) )= T (2 x x − 1) ( 5.8.12 T n 2 n Thus we can evaluate a series of even Chebyshev polynomials by calling chebev cients stored consecutively in the array with the even coef fi , but with the argument c 2 − 1 . x 2 x replaced by An odd function will have an expansion involving only odd Chebyshev poly- ( x ) nomials. It is best to rewrite it as an expansion for the function , which f /x involves only even Chebyshev polynomials. This will give accurate values for ′ /x can be found from those for for f ( x ) /x fi cients c ( x ) near f x =0 . The coef n f ( x ) by recurrence: ′ c =0 +1 N ( ) 5.8.13 ′ ′ =2 c 3 − c N − ,N ,n = ,... − 1 c n n 1 +1 − n Equation (5.8.13) follows from the recurrence relation in equation (5.8.2). If you insist on evaluating an odd Chebyshev series, the ef fi cient way is to once 2 cients , and with the odd coef 1 − fi with y x replaced by x =2 chebev again use . Now, however, you must also change the last stored consecutively in the array c formula in equation (5.8.11) to be ]( 5.8.14 ) c + d − − ( d y [(2 x )= f x 1) 2 0 1 and change the corresponding line in . chebev CITED REFERENCES AND FURTHER READING: Clenshaw, C.W. 1962, Mathematical Tables , vol. 5, National Physical Laboratory, (London: H.M. Stationery Office). [1] Goodwin, E.T. (ed.) 1961, Modern Computing Methods , 2nd ed. (New York: Philosophical Li- brary), Chapter 8. (Englewood Cliffs, NJ: Prentice-Hall), Dahlquist, G., and Bjorck, A. 1974, Numerical Methods g of machine- isit website § 4.4.1, p. 104. ica). Johnson, L.W., and Riess, R.D. 1982, , 2nd ed. (Reading, MA: Addison- Numerical Analysis § 6.5.2, p. 334. Wesley), Carnahan, B., Luther, H.A., and Wilkes, J.O. 1969, Applied Numerical Methods (New York: § 1.10, p. 39. Wiley),

219 5.9 Derivatives or Integrals of a Chebyshev-approximated Function 195 5.9 Derivatives or Integrals of a Chebyshev-approximated Function If you have obtained the Chebyshev coefficients that approximate a function in a certain range (e.g., from 5.8), then it is a simple matter to transform § in chebft Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) them to Chebyshev coefficients corresponding to the derivative or integral of the function. Having done this, you can evaluate the derivative or integral just as if it ab initio . were a function that you had Chebyshev-fitted are the coefficients ,i =0 ,...,m − 1 c The relevant formulas are these: If i are the coefficients that that approximate a function f in equation (5.8.9), C i ′ approximate the indefinite integral of c , and f are the coefficients that approximate i f , then the derivative of c c − +1 i − i 1 0) ) 5.9.1 ( ( i> = C i i 2 ′ ′ c 5.9.2 ) 1) ,..., 2 = c ( − ,m +2 ic 1 ( i = m − i − i +1 1 i C Equation (5.9.1) is augmented by an arbitrary choice of , corresponding to an 0 arbitrary constant of integration. Equation (5.9.2), which is a recurrence, is started ′ ′ m , corresponding to no information about the +1 =0 = c st c with the values − m m 1 Chebyshev coefficient of the original function f . Here are routines for implementing equations (5.9.1) and (5.9.2). void chder(float a, float b, float c[], float cder[], int n) Given . a,b,c[0..n-1] , as output from routine chebft § 5 8 ,andgiven n , the desired degree ,the cder[0..n-1] c to be used), this routine returns the array of approximation (length of Chebyshev coefficients of the derivative of the function whose coefficients are . c { int j; float con; cder[n-1]=0.0; n-1 and n-2 are special cases. cder[n-2]=2*(n-1)*c[n-1]; for (j=n-3;j>=0;j--) cder[j]=cder[j+2]+2*(j+1)*c[j+1]; Equation (5.9.2). con=2.0/(b-a); for (j=0;j

220 196 Evaluation of Functions Chapter 5. for (j=1;j<=n-2;j++) { cint[j]=con*(c[j-1]-c[j+1])/j; Equation (5.9.1). Accumulates the constant of integration. sum += fac*cint[j]; fac = -fac; Will equal ± 1 . } . Special case of (5.9.1) for n-1 cint[n-1]=con*c[n-2]/(n-1); sum += fac*cint[n-1]; cint[0]=2.0*sum; Set the constant of integration. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v } Clenshaw-Curtis Quadrature c Since a smooth function’s Chebyshev coefficients decrease rapidly, generally expo- i nentially, equation (5.9.1) is often quite efficient as the basis for a quadrature scheme. The chebft and chint routines chebev , used in that order, can be followed by repeated calls to ∫ x if b . ≤ f ( x ) dx is required for many different values of x in the range a ≤ x a ∫ b and are f ( x ) dx is required, then chint chebev If only the single definite integral a replaced by the simpler formula, derived from equation (5.9.1), [ ] ∫ b 1 1 1 1 ) dx =( b − a ) −··· f −···− ( − x − c c c c 2 0 4 k 2 + 1)(2 15 (2 k 2 k − 1) 3 a ) ( 5.9.3 where the c ’s are as returned by chebft . The series can be truncated when c becomes i k 2 negligible, and the first neglected term gives an error estimate. [1] Clenshaw-Curtis quadrature This scheme is known as . It is often combined with an N , the number of Chebyshev coefficients calculated via equation (5.8.7), adaptive choice of which is also the number of function evaluations of f does x ) . If a modest choice of N ( not give a sufficiently small c in equation (5.9.3), then a larger value is tried. In this k 2 adaptive case, it is even better to replace equation (5.8.7) by the so-called “trapezoidal” or Gauss-Lobatto ( § 4.5) variant, ) ( )] [ ( N ∑ ′′ 2 πjk πk 1( =0 j 5.9.4 ) cos ,...,N − cos f = c j N N N k =0 where (N.B.!) the two primes signify that the first and last terms in the sum are to be N multiplied by 1 / 2 .If is doubled in equation (5.9.4), then half of the new function evaluation points are identical to the old ones, allowing the previous function evaluations to be reused. This feature, plus the analytic weights and abscissas (cosine functions in 5.9.4), give § Clenshaw-Curtis quadrature an edge over high-order adaptive Gaussian quadrature (cf. 4.5), which the method otherwise resembles. If your problem forces you to large values of , you should be aware that equation (5.9.4) N can be evaluated rapidly, and simultaneously for all the values of j , by a fast cosine transform. (See § 12.3, especially equation 12.3.17.) (We already remarked that the nontrapezoidal form (5.8.7) can also be done by fast cosine methods, cf. equation 12.3.22.) g of machine- isit website ica). CITED REFERENCES AND FURTHER READING: Goodwin, E.T. (ed.) 1961, Modern Computing Methods , 2nd ed. (New York: Philosophical Li- brary), pp. 78–79. Clenshaw, C.W., and Curtis, A.R. 1960, Numerische Mathematik , vol. 2, pp. 197–205. [1]

221 5.10 Polynomial Approximation from Chebyshev Coefficients 197 5.10 Polynomial Approximation from Chebyshev Coefficients You may well ask after reading the preceding two sections, “Must I store and evaluate my Chebyshev approximation as an array of Chebyshev coefficients for a Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v c ? Can’t I convert the transformed variable y ’s into actual polynomial coefficients k and have an approximation of the following form?” x in the original variable m − 1 ∑ k ( x ) g 5.10.1 ≈ ) x ( f k =0 k Yes, you can do this (and we will give you the algorithm to do it), but we caution you against it: Evaluating equation (5.10.1), where the coefficient ’s reflect g an underlying Chebyshev approximation, usually requires more significant figures than evaluation of the Chebyshev sum directly (as by chebev ). This is because the Chebyshev polynomials themselves exhibit a rather delicate cancellation: The n − 1 x ) , for example, is are ) x ( ; other coefficients of T ( 2 T leading coefficient of n n even bigger; yet they all manage to combine into a polynomial that lies between ± 1 . when Only m is no larger than 7 or 8 should you contemplate writing a Chebyshev fit as a direct polynomial, and even in those cases you should be willing to tolerate two or so significant figures less accuracy than the roundoff limit of your machine. g ’s in equation (5.10.1) from the c ’s output from chebft (suitably You get the m ) by calling in sequence the following two procedures: truncated at a modest value of #include "nrutil.h" void chebpc(float c[], float d[], int n) Chebyshev polynomial coefficients. Given a coefficient array c[0..n-1] , this routine generates ∑ ∑ n-1 n-1 k c d − y d[0..n-1] .Themethod such that = 2 c ) T y ( / a coefficient array 0 k k k k =0 =0 k is Clenshaw’s recurrence (5.8.11), but now applied algebraically rather than arithmetically. { int k,j; float sv,*dd; dd=vector(0,n-1); for (j=0;j=1;j--) { for (k=n-j;k>=1;k--) { sv=d[k]; d[k]=2.0*d[k-1]-dd[k]; g of machine- isit website dd[k]=sv; } ica). sv=d[0]; d[0] = -dd[0]+c[j]; dd[0]=sv; } for (j=n-1;j>=1;j--) d[j]=d[j-1]-dd[j]; d[0] = -dd[0]+0.5*c[0]; free_vector(dd,0,n-1); }

222 198 Evaluation of Functions Chapter 5. void pcshft(float a, float b, float d[], int n) Polynomial coefficient shift. Given a coefficient array d[0..n-1] , this routine generates a ∑ ∑ n-1 n-1 k k coefficient array g and x ,where are related d x y y = [0..n-1] such that g k k =0 k k =0 a =j;k--) never learned it, go do so. You won’t be sorry. d[k] -= cnst*d[k+1]; } CITED REFERENCES AND FURTHER READING: Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), pp. 59, 182–183 [synthetic division]. 5.11 Economization of Power Series One particular application of Chebyshev methods, the economization of power series ,is an occasionally useful technique, with a flavor of getting something for nothing. Suppose that you are already computing a function by the use of a convergent power series, for example 3 2 x x x x ) ≡ 1 − f ( ( + − ) + ··· 5.11.1 5! 7! 3! √ √ ) x , but pretend you don’t know that.) You might be / x sin( (This function is actually doing a problem that requires evaluating the series many times in some particular interval, say 2 , (2 π ) [0 ] . Everything is fine, except that the series requires a large number of terms before its error (approximated by the first neglected term, say) is tolerable. In our example, with 7 − 13 2 =(2 x π ) 10 / is x , the first term smaller than (27!) . This then approximates the error 12 of the finite series whose last term is x / (25!) . 7 − 13 g of machine- Notice that because of the large exponent in x much smaller 10 , the error is than isit website . This is the feature that allows everywhere in the interval except at the very largest values of x ica). “economization”: if we are willing to let the error elsewhere in the interval rise to about the same value that the first neglected term has at the extreme end of the interval, then we can replace the 13-term series by one that is significantly shorter. Here are the steps for doing so: 1. Change variables from x to y , as in equation (5.8.10), to map the x interval into 1 − ≤ y ≤ 1 . 2. Find the coefficients of the Chebyshev sum (like equation 5.8.8) that exactly equals your truncated power series (the one with enough terms for accuracy). 3. Truncate this Chebyshev series to a smaller number of terms, using the coefficient of the first neglected Chebyshev polynomial as an estimate of the error.

223 5.11 Economization of Power Series 199 4. Convert back to a polynomial in y . 5. Change variables back to x . All of these steps can be done numerically, given the coefficients of the original power series expansion. The first step is exactly the inverse of the routine pcshft ( § 5.10), which ] a,b [ ). But since , mapped a polynomial from y (in the interval [ − 1 (in the interval 1] )to x for the equation (5.8.10) is a linear relation between x and y , one can also use pcshft The inverse of inverse. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) pcshft( a , b ,d,n) turns out to be (you can check this) ( ) − − a 2 − b − a b 2 − pcshft , ,d,n − a b − a b (which The second step requires the inverse operation to that done by the routine chebpc pccheb , took Chebyshev coefficients into polynomial coefficients). The following routine, [1] accomplishes this, using the formula ( ( ] [ ) ) 1 k k k x 5.11.2 x )+ )+ ··· ( = ( ) ( x )+ ( x T T T 2 k 4 k − k − − k 1 2 2 1 where the last term depends on whether k is even or odd, ( ( ) ) k 1 k ··· + even ( )( x k ( ) ( x )( k odd ) , ··· + 5.11.3 . ) T T 0 1 2 ( / 1) − k 2 k/ 2 void pccheb(float d[], float c[], int n) , returns an chebpc : given an array of polynomial coefficients d[0..n-1] Inverse of routine equivalent array of Chebyshev coefficients c[0..n-1] . { int j,jm,jp,k; float fac,pow; Will be powers of 2. pow=1.0; c[0]=2.0*d[0]; in the polynomial. for (k=1;k=0;j-=2,jm--,jp++) { Increment this and lower orders of Chebyshev with the combinatorial coefficent times d[k] ;seetextforformula. g of machine- isit website c[j] += fac; ica). fac *= ((float)jm)/((float)jp); } pow += pow; } } The fourth and fifth steps are accomplished by the routines chebpc and pcshft , respectively. Here is how the procedure looks all together:

224 200 Chapter 5. Evaluation of Functions #define NFEW .. #define NMANY .. float *c,*d,*e,a,b; a,b NFEW into ( ) Economize NMANY power series coefficients e[0..NMANY-1] in the range coefficients d[0..NFEW-1] . c=vector(0,NMANY-1); Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer d=vector(0,NFEW-1); e=vector(0,NMANY-1); pcshft((-2.0-b-a)/(b-a),(2.0-b-a)/(b-a),e,NMANY); pccheb(e,c,NMANY); ... Here one would normally examine the Chebyshev coefficients c[0..NMANY-1] to decide can be. NFEW how small chebpc(c,d,NFEW); pcshft(a,b,d,NFEW); 8 th through 10 th Chebyshev coefficients turn out to In our example, by the way, the − 6 9 − 7 − − 7 × 10 be on the order of , 3 , and − 9 × 10 10 × , so reasonable truncations (for single precision calculations) are somewhere in this range, yielding a polynomial with 8 – . terms instead of the original 10 13 Replacing a 13-term polynomial with a (say) 10-term polynomial without any loss of accuracy — that does seem to be getting something for nothing. Is there some magic in this technique? Not really. The 13-term polynomial defined a function x ) . Equivalent to ( f f ( ) economizing the series, we could instead have evaluated at enough points to construct x its Chebyshev approximation in the interval of interest, by the methods of § 5.8. We would have obtained just the same lower-order polynomial. The principal lesson is that the rate of convergence of Chebyshev coefficients has nothing to do with the rate of convergence of power series coefficients; and it is the former that dictates the number of terms needed in a polynomial approximation. A function might have a divergent power series in some region of interest, but if the function itself is well-behaved, it will have perfectly good polynomial § 5.8, but not by economization of approximations. These can be found by the methods of series. There is slightly less to economization of series than meets the eye. CITED REFERENCES AND FURTHER READING: Acton, F.S. 1970, Numerical Methods That Work ; 1990, corrected edition (Washington: Mathe- matical Association of America), Chapter 12. Arfken, G. 1970, , 2nd ed. (New York: Academic Press), Mathematical Methods for Physicists p. 631. [1] ́ 5.12 Pad e Approximants g of machine- isit website ica). ́ A e approximant , so called, is that rational function (of a specified order) whose Pa d power series expansion agrees with a given power series to the highest possible order. If the rational function is M ∑ k x a k k =0 ) 5.12.1 ( ( R x ≡ ) N ∑ k b x 1+ k k =1

225 5.12 Pad ́ 201 e Approximants ́ ) R ( x then is said to be a Pad e approximant to the series ∞ ∑ k 5.12.2 x ) c ( ≡ ) x ( f k k =0 if ) R (0) = f (0) ( 5.12.3 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. and also ∣ ∣ k k ∣ ∣ d d ∣ ∣ ) N ( + ,...,M 2 = 5.12.4 R , =1 f ( x ) ( x ) ,k ∣ ∣ k k dx dx x =0 x =0 ,...,a + N +1 equations for the unknowns a Equations (5.12.3) and (5.12.4) furnish M M 0 and b ,...,b . The easiest way to see what these equations are is to equate (5.12.1) and N 1 (5.12.2), multiply both by the denominator of equation (5.12.1), and equate all powers of that have either a ’s or b ’s in their coefficients. If we consider only the special case of x a diagonal rational approximation, M = N (cf. § 3.2), then we have a , with the = c 0 0 ’s and b ’s satisfying a remaining N ∑ b 5.12.5 − ,...,N c ) = c ,k =1 ( m − N + k m + k N =1 m k ∑ 5.12.6 ( ,...,N =1 b ) c ,k a = m m k k − m =0 =1 ). To solve these, start with equations (5.12.5), which (note, in equation 5.12.1, that b 0 b are a set of linear equations for all the unknown ’s. Although the set is in the form of a Toeplitz matrix (compare equation 2.8.8), experience shows that the equations are frequently close to singular, so that one should not solve them by the methods of § 2.8, but rather by LU full decomposition. Additionally, it is a good idea to refine the solution by iterative [1] in § 2.5) improvement (routine mprove . b ’s are known, then equation (5.12.6) gives an explicit formula for the unknown Once the a ’s, completing the solution. ́ e approximants are typically used when there is some unknown underlying function Pad f x ) . We suppose that you are able somehow to compute, perhaps by laborious analytic ( ′ ′′ ( x ) and a few of its derivatives at x =0 : f (0) , f expansions, the values of f , (0) (0) , f and so on. These are of course the first few coefficients in the power series expansion of f ( x ) ; but they are not necessarily getting small, and you have no idea where (or whether) the power series is convergent. By contrast with techniques like Chebyshev approximation ( § 5.8) or economization of power series ( 5.11) that only condense the information that you already know about a § ́ e approximants can give you genuinely new information about your function’s function, Pad values. It is sometimes quite mysterious how well this can work. (Like other mysteries in .) An example will illustrate. mathematics, it relates to analyticity Imagine that, by extraordinary labors, you have ground out the first five terms in the , power series expansion of an unknown function f ( x ) g of machine- isit website 1 1 175 49 4 3 2 ) x 2+ ( f ≈ x x x + x + + ) 5.12.7 − ( ··· ica). 81 9 78732 8748 (It is not really necessary that you know the coefficients in exact rational form — numerical values are just as good. We here write them as rationals to give you the impression that they derive from some side analytic calculation.) Equation (5.12.7) is plotted as the curve > 4 it is dominated by its x labeled “power series” in Figure 5.12.1. One sees that for ∼ largest, quartic, term. We now take the five coefficients in equation (5.12.7) and run them through the routine pade listed below. It returns five rational coefficients, three a ’s and two b ’s, for use in equation ́ . The curve in the figure labeled “Pad = N =2 M (5.12.1) with e” plots the resulting rational function. Note that both solid curves derive from the same five original coefficient values.

226 202 Chapter 5. Evaluation of Functions 10 1/3 4/3 + + = [7 x ] ) x ( f (1 ) 8 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 6 power series (5 terms) ) x ( f 4 é (5 coefficients) Pad exact 2 0 0246810 x ́ fi ve-term power series expansion and the derived fi ve-coef fi cient Pad Figure 5.12.1. The e approximant ́ . Note that the Pad ) . The full power series converges only for x< 1 x for a sample function e ( f approximant maintains accuracy far outside the radius of convergence of the series. Deus ex machina (a useful fellow, when he is available) To evaluate the results, we need to tell us that equation (5.12.7) is in fact the power series expansion of the function 4 / 3 3 1 / f ( x )=[7+(1+ x ) ] 5.12.8 ) ( which is plotted as the dotted curve in the fi , x = − 1 gure. This function has a branch point at 1 . In most of the range #include "nrutil.h" #define BIG 1.0e30 void pade(double cof[], int n, float *resid) Given cof[0..2*n] , the leading terms in the power series expansion of a function, solve the linear Pad ́ e equations to return the coefficients of a diagonal rational function approximation to N + cof[0] + cof[1] x + ··· + cof[n] x ··· ) / (1 + cof[n+1] x + the same function, namely (

227 5.12 Pad ́ e Approximants 203 N cof[2*n] .Thevalue ) is the norm of the residual vector; a small value indicates a x resid well-converged solution. Note that cof is double precision for consistency with ratval . { void lubksb(float **a, int n, int *indx, float b[]); void ludcmp(float **a, int n, int *indx, float *d); void mprove(float **a, float **alud, int n, int indx[], float b[], float x[]); int j,k,*indx; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v float d,rr,rrold,sum,**q,**qlu,*x,*y,*z; indx=ivector(1,n); q=matrix(1,n,1,n); qlu=matrix(1,n,1,n); x=vector(1,n); y=vector(1,n); z=vector(1,n); for (j=1;j<=n;j++) { Set up matrix for solving. y[j]=x[j]=cof[n+j]; for (k=1;k<=n;k++) { q[j][k]=cof[j-k+n]; qlu[j][k]=q[j][k]; } } LU decomposition and backsubstitu- Solve by ludcmp(qlu,n,indx,&d); lubksb(qlu,n,indx,x); tion. rr=BIG; do { Important to use iterative improvement, since the Pad ́ e equations tend to be ill-conditioned. rrold=rr; for (j=1;j<=n;j++) z[j]=x[j]; mprove(q,qlu,n,indx,y,x); for (rr=0.0,j=1;j<=n;j++) Calculate residual. rr += SQR(z[j]-x[j]); } while (rr < rrold); If it is no longer improving, call it quits. *resid=sqrt(rrold); for (k=1;k<=n;k++) { Calculate the remaining coefficients. for (sum=cof[k],j=1;j<=k;j++) sum -= z[j]*cof[k-j]; y[k]=sum; Copy answers to output. } for (j=1;j<=n;j++) { cof[j]=y[j]; cof[j+n] = -z[j]; } free_vector(z,1,n); free_vector(y,1,n); free_vector(x,1,n); free_matrix(qlu,1,n,1,n); free_matrix(q,1,n,1,n); free_ivector(indx,1,n); } g of machine- isit website CITED REFERENCES AND FURTHER READING: ica). Ralston, A. and Wilf, H.S. 1960, (New York: Wiley), Mathematical Methods for Digital Computers p. 14. Cuyt, A., and Wuytack, L. 1987, Nonlinear Methods in Numerical Analysis (Amsterdam: North- Holland), Chapter 2. Graves-Morris, P.R. 1979, in Pad ́ e Approximation and Its Applications , Lecture Notes in Mathe- matics, vol. 765, L. Wuytack, ed. (Berlin: Springer-Verlag). [1]

228 204 Evaluation of Functions Chapter 5. 5.13 Rational Chebyshev Approximation § 5.8 and § 5.10 we learned how to find good polynomial approximations to a given In x f ( ) function in a given interval a ≤ x ≤ b . Here, we want to generalize the task to find § good approximations that are rational functions (see 5.3). The reason for doing so is that, for some functions and some intervals, the optimal rational function approximation is able readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. to achieve substantially higher accuracy than the optimal polynomial approximation with the same number of coefficients. This must be weighed against the fact that finding a rational function approximation is not as straightforward as finding a polynomial approximation, which, as we saw, could be done elegantly via Chebyshev polynomials. and denominator x ) have numerator of degree Let the desired rational function R ( m k . Then we have of degree m p + ··· + + x p p x 0 1 m x ( R ) ≡ 5.13.1 ( b ≤ x a ) for ) x ≈ f ( ≤ k + ··· + x q 1+ x q 1 k k ,...,p +1 + and q m ,...,q , that is, p The unknown quantities that we need to find are 1 m 0 k quantities in all. Let r ( x ) denote the deviation of R ( x ) from f ( x ) , and let r denote its maximum absolute value, r x ) ≡ R ( x ) − f ( x ) r ≡ max ( 5.13.2 | r ( x ) | ( ) b ≤ x a ≤ The ideal minimax solution would be that choice of p ’s and q ’s that minimizes r . Obviously there is some minimax solution, since r is bounded below by zero. How can we find it, or a reasonable approximation to it? R ( x ) is nondegenerate A first hint is furnished by the following fundamental theorem: If (has no common polynomial factors in numerator and denominator), then there is a unique choice of p ’s and q ’s that minimizes r ; for this choice, r ( x ) has m + k +2 extrema in and with alternating sign a x ≤ b , all of magnitude r ≤ . (We have omitted some technical [1] assumptions in this theorem. See Ralston for a precise statement.) We thus learn that the situation with rational functions is quite analogous to that for minimax polynomials: In § 5.8 n +1 Chebyshev coefficients, we saw that the error term of an n th order approximation, with T was generally dominated by the first neglected Chebyshev term, namely , which itself +1 n +2 has extrema of equal magnitude and alternating sign. So, here, the number of rational n + k coefficients, , plays the same role of the number of polynomial coefficients, n +1 . m +1 r ( x ) should have m + A different way to see why +2 extrema is to note that R ( x ) k can be made exactly equal to ( x ) at any m + f +1 points x k . Multiplying equation (5.13.1) i by its denominator gives the equations k m p + ) x x q + ··· + p ··· x + + = f ( x x )(1 + q p 1 i m 1 i 0 i k i i ) ( 5.13.3 , 2 ,...,m + k +1 i =1 +1 ’s, which can be m + k This is a set of linear equations for the unknown p ’s and q g of machine- x solved by standard methods (e.g., LU decomposition). If we choose the ’s to all be in i isit website the interval ( a,b ) , then there will generically be an extremum between each chosen x and i ica). x , plus also extrema where the function goes out of the interval at a and b , for a total i +1 m + +2 extrema. For arbitrary x of k ’s, the extrema will not have the same magnitude. i The theorem says that, for one particular choice of x ’s, the magnitudes can be beaten down i r . to the identical, minimal, value of f x Instead of making ( x ) R ( x , one can instead force the and equal at the points ) i i i residual r ( x by solving the linear equations ) to any desired values y i i m k x + ··· + p x x ) =[ f + x ) − y p ](1 + q x + ··· + q ( p i i i m 0 1 i 1 k i i 5.13.4 ( ) 2 , + k +1 =1 i ,...,m

229 5.13 Rational Chebyshev Approximation 205 In fact, if the x ’s are chosen to be the extrema (not the zeros) of the minimax solution, i then the equations satisfied will be k m p x + ··· + p + x x + ··· + ) q x q =[ f ( x ](1 + ) ± r p i m 1 i i 1 0 k i i ) ( 5.13.5 , 2 i k +2 + =1 ,...,m http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v where the ± alternates for the alternating extrema. Notice that equation (5.13.5) is satisfied at m + k +2 extrema, while equation (5.13.4) was satisfied only at m + k +1 arbitrary points. How can this be? The answer is that in equation (5.13.5) is an additional unknown, so that r the number of both equations and unknowns is m + k +2 . True, the set is mildly nonlinear (in r ), but in general it is still perfectly soluble by methods that we will develop in Chapter 9. We thus see that, given only the locations of the extrema of the minimax rational function, we can solve for its coefficients and maximum deviation. Additional theorems, [1] leading up to the so-called Remes algorithms , tell how to converge to these locations by an iterative process. For example, here is a (slightly simplified) statement of Remes’ Second x +2 extrema Algorithm : (1) Find an initial rational function with m + k (not having equal i . (3) Evaluate the r deviation). (2) Solve equation (5.13.5) for new rational coefficients and resulting R x ) to find its actual extrema (which will not be the same as the guessed values). ( (4) Replace each guessed value with the nearest actual extremum of the same sign. (5) Go back to step 2 and iterate to convergence. Under a broad set of assumptions, this method will [1] converge. Ralston fills in the necessary details, including how to find the initial set of x ’s. i Up to this point, our discussion has been textbook-standard. We now reveal ourselves as heretics. We don’t much like the elegant Remes algorithm. Its two nested iterations (on r in the nonlinear set 5.13.5, and on the new sets of x ’s) are finicky and require a lot of i special logic for degenerate cases. Even more heretical, we doubt that compulsive searching for the exactly best , equal deviation, approximation is worth the effort — except perhaps for those few people in the world whose business it is to find optimal approximations that get built into compilers and microchips. When we use rational function approximation, the goal is usually much more pragmatic: Inside some inner loop we are evaluating some function a zillion times, and we want to speed up its evaluation. Almost never do we need this function to the last bit of machine accuracy. Suppose (heresy!) we use an approximation whose error has m + k +2 extrema whose deviations differ by a factor of 2. The theorems on which the Remes algorithms are based guarantee that the perfect minimax solution will have extrema somewhere within this factor of 2 range – forcing down the higher extrema will cause the lower ones to rise, until all are equal. So our “sloppy” approximation is in fact within a fraction of a least significant bit of the minimax one. That is good enough for us, especially when we have available a very robust method for finding the so-called “sloppy” approximation. Such a method is the least-squares solution § 15.4). We of overdetermined linear equations by singular value decomposition ( § 2.6 and proceed as follows: First, solve (in the least-squares sense) equation (5.13.3), not just for k +1 values of x + m , but for a significantly larger number of x ’s, spaced approximately i i R ( like the zeros of a high-order Chebyshev polynomial. This gives an initial guess for x ) . Second, tabulate the resulting deviations, find the mean absolute deviation, call it r , and then solve (again in the least-squares sense) equation (5.13.5) with r fixed and the ± chosen to be g of machine- the sign of the observed deviation at each point x . Third, repeat the second step a few times. i isit website You can spot some Remes orthodoxy lurking in our algorithm: The equations we solve ica). are trying to bring the deviations not to zero, but rather to plus-or-minus some consistent value. However, we dispense with keeping track of actual extrema; and we solve only linear equations at each stage. One additional trick is to solve a weighted least-squares problem, where the weights are chosen to beat down the largest deviations fastest. Here is a program implementing these ideas. Notice that the only calls to the function fn occur in the initial filling of the table fs . You could easily modify the code to do this filling outside of the routine. It is not even necessary that your abscissas xs be exactly the ones that we use, though the quality of the fit will deteriorate if you do not have several abscissas between each extremum of the (underlying) minimax solution. Notice that the rational coefficients are output in a format suitable for evaluation by the routine ratval in § 5.3.

230 206 Chapter 5. Evaluation of Functions = = m 4 k x − 6 = + x x ( f cos( e ) )/(1 ) 10 2 × 0 π < x < − 6 10 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer ) x ( f − ) x 0 ( R − 6 − 10 1 × − 6 − 10 2 × 2.5 .5 1 1.5 2 3 0 x ) Figure 5.13.1. r ( x Solid curves show deviations for fi ve successive iterations of the routine ratlsq for an arbitrary test problem. The algorithm does not converge to exactly the minimax solution (shown as the dotted curve). But, after one iteration, the discrepancy is a small fraction of the last signi fi cant bit of accuracy. #include #include #include "nrutil.h" #define NPFAC 8 #define MAXIT 5 #define PIO2 (3.141592653589793/2.0) #define BIG 1.0e30 void ratlsq(double (*fn)(double), double a, double b, int mm, int kk, double cof[], double *dev) Returns in cof[0..mm+kk] the coefficients of a rational function approximation to the function b fn in the interval ( a , kk ) . Input quantities mm and specify the order of the numerator and denominator, respectively. The maximum absolute deviation of the approximation (insofar as is known) is returned as dev . { double ratval(double x, double cof[], int mm, int kk); void dsvbksb(double **u, double w[], double **v, int m, int n, double b[], double x[]); void dsvdcmp(double **a, int m, int n, double w[], double **v); svbksb . These are double versions of svdcmp , g of machine- int i,it,j,ncof,npt; isit website double devmax,e,hth,power,sum,*bb,*coff,*ee,*fs,**u,**v,*w,*wt,*xs; ica). ncof=mm+kk+1; npt=NPFAC*ncof; Number of points where function is evaluated, i.e., fineness of the mesh. bb=dvector(1,npt); coff=dvector(0,ncof-1); ee=dvector(1,npt); fs=dvector(1,npt); u=dmatrix(1,npt,1,ncof); v=dmatrix(1,ncof,1,ncof); w=dvector(1,ncof); wt=dvector(1,npt);

231 5.13 Rational Chebyshev Approximation 207 xs=dvector(1,npt); *dev=BIG; for (i=1;i<=npt;i++) { Fill arrays with mesh abscissas and function val- if (i < npt/2) { ues. At each end, use formula that minimizes round- hth=PIO2*(i-1)/(npt-1.0); off sensitivity. xs[i]=a+(b-a)*DSQR(sin(hth)); } else { hth=PIO2*(npt-i)/(npt-1.0); http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) xs[i]=b-(b-a)*DSQR(sin(hth)); } fs[i]=(*fn)(xs[i]); wt[i]=1.0; In later iterations we will adjust these weights to combat the largest deviations. ee[i]=1.0; } e=0.0; for (it=1;it<=MAXIT;it++) { Loop over iterations. for (i=1;i<=npt;i++) { Set up the “design matrix” for the least-squares fit. power=wt[i]; bb[i]=power*(fs[i]+SIGN(e,ee[i])); Key idea here: Fit to fn where x )+ e where the deviation is positive, to fn ( x ) − e ( e is supposed to become an approximation to the equal-ripple it is negative. Then deviation. for (j=1;j<=mm+1;j++) { u[i][j]=power; power *= xs[i]; } power = -bb[i]; for (j=mm+2;j<=ncof;j++) { power *= xs[i]; u[i][j]=power; } } dsvdcmp(u,npt,ncof,w,v); Singular Value Decomposition. In especially singular or difficult cases, one might here edit the singular values w[1..ncof] , replacing small values by zero. Note that dsvbksb works with one-based arrays, so we . must subtract 1 when we pass it the zero-based array coff dsvbksb(u,w,v,npt,ncof,bb,coff-1); devmax=sum=0.0; Tabulate the deviations and revise the weights. for (j=1;j<=npt;j++) { ee[j]=ratval(xs[j],coff,mm,kk)-fs[j]; wt[j]=fabs(ee[j]); Use weighting to emphasize most deviant points. sum += wt[j]; if (wt[j] > devmax) devmax=wt[j]; } Update e=sum/npt; e to be the mean absolute deviation. if (devmax <= *dev) { Save only the best coefficient set found. for (j=0;j

232 208 Chapter 5. Evaluation of Functions Figure 5.13.1 shows the discrepancies for the first five iterations of ratlsq when it is x )=cos e x/ (1 + applied to find the m = k =4 rational fit to the function f ( x in the ) ) interval (0 ,π . One sees that after the first iteration, the results are virtually as good as the minimax solution. The iterations do not converge in the order that the figure suggests: In fact, it is the second iteration that is best (has smallest maximum deviation). The routine ratlsq accordingly returns the best of its iterations, not necessarily the last one; there is no advantage in doing more than five iterations. Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) CITED REFERENCES AND FURTHER READING: Ralston, A. and Wilf, H.S. 1960, Mathematical Methods for Digital Computers (New York: Wiley), Chapter 13. [1] 5.14 Evaluation of Functions by Path Integration In computer programming, the technique of choice is not necessarily the most efficient, or elegant, or fastest executing one. Instead, it may be the one that is quick to implement, general, and easy to check. One sometimes needs only a few, or a few thousand, evaluations of a special function, perhaps a complex valued function of a complex variable, that has many different parameters, or asymptotic regimes, or both. Use of the usual tricks (series, continued fractions, rational function approximations, recurrence relations, and so forth) may result in a patchwork program with tests and branches to different formulas. While such a program may be highly efficient in execution, it is often not the shortest way to the answer from a standing start. A different technique of considerable generality is direct integration of a function’s defining differential equation – an ab initio integration for each desired function value — along a path in the complex plane if necessary. While this may at first seem like swatting a fly with a golden brick, it turns out that when you already have the brick, and the fly is asleep right under it, all you have to do is let it fall! As a specific example, let us consider the complex hypergeometric func- , which is defined as the analytic continuation of the so-called ) F z ( a,b,c ; tion 1 2 hypergeometric series, 2 b a +1) b ( z ( z +1) a ab + ··· + F )=1+ z ; a,b,c ( 1 2 2! c ( c +1) c 1! g of machine- isit website j a ( a +1) ... ( a + j − 1) b ( b +1) ... ( b + j − 1) z ica). + + ··· ( c + j ( 1) c c j ! +1) ... − ( 5.14.1 ) [1] The series converges only within the unit circle < 1 (see | z | ), but one’s interest in the function is often not confined to this region. F is a solution (in fact the solution that is regular The hypergeometric function 2 1 at the origin) of the hypergeometric differential equation, which we can write as ′′ ′ − z ) F z (1 b abF [ c − ( a + − +1) z ] F = ( 5.14.2 )

233 5.14 Evaluation of Functions by Path Integration 209 Here prime denotes d/dz . One can see that the equation has regular singular points =0 z and 1 , the values z =0 , 1 , and ∞ . Since the desired solution is regular at at ∞ will in general be branch points. If we want F to be a single valued function, 1 2 we must have a branch cut connecting these two points. A conventional position for , though we may wish to keep this cut is along the positive real axis from 1 to ∞ open the possibility of altering this choice for some applications. Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Our golden brick consists of a collection of routines for the integration of sets of ordinary differential equations, which we will develop in detail later, in Chapter 16. For now, we need only a high-level, “black-box” routine that integrates such a set from initial conditions at one value of a (real) independent variable to final conditions at some other value of the independent variable, while automatically adjusting its internal stepsize to maintain some specified accuracy. That routine is odeint and, in one particular invocation, calculates its individual steps with called a sophisticated Bulirsch-Stoer technique. ′ z , and at some value and its derivative F Suppose that we know values for F 0 F at some other point z that we want to find in the complex plane. The straight-line 1 path connecting these two points is parametrized by − 5.14.3 z )( ) + s ( z z s ( z )= 1 0 0 s a real parameter. The differential equation (5.14.2) can now be written as with a set of two first-order equations, dF ′ z =( − z ) F 1 0 ds ( ) 5.14.4 ) ( ′ ′ ] z +1) b + a F abF − [ c − ( dF z =( ) − z 0 1 (1 ds ) z − z ′ F to to be integrated from =1 . Here =0 and F s s are to be viewed as two can be ignored; it d/dz independent complex variables. The fact that prime means will emerge as a consequence of the first equation in (5.14.4). Moreover, the real and imaginary parts of equation (5.14.4) define a set of four real differential equations, . The complex arithmetic on the right-hand side can be with independent variable s viewed as mere shorthand for how the four components are to be coupled. It is precisely this point of view that gets passed to the routine odeint , since it knows nothing of either complex functions or complex independent variables. It remains only to decide where to start, and what path to take in the complex plane, to get to an arbitrary point z . This is where consideration of the function’s g of machine- singularities, and the adopted branch cut, enter. Figure 5.14.1 shows the strategy isit website that we adopt. For | z |≤ 1 / 2 , the series in equation (5.14.1) will in general converge ica). rapidly, and it makes sense to use it directly. Otherwise, we integrate along a straight (0 line path from one of the starting points 1 / 2 , 0) or ± , ± 1 / 2) . The former choices ( are natural for 0 < Re ( z ) < 1 and Re ( z ) < 0 , respectively. The latter choices are used for Re z ) > 1 , above and below the branch cut; the purpose of starting away ( from the real axis in these cases is to avoid passing too close to the singularity at z =1 (see Figure 5.14.1). The location of the branch cut is defined by the fact that . our adopted strategy never integrates across the real axis for Re z ) > 1 ( An implementation of this algorithm is given in § 6.12 as the routine hypgeo .

234 210 Chapter 5. Evaluation of Functions Im Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin use power series branch cut 01 Re Figure 5.14.1. Complex plane showing the singular points of the hypergeometric function, its branch z | cut, and some integration paths from the circle / 2 (where the power series converges rapidly) | =1 to other points in the plane. A number of variants on the procedure described thus far are possible, and easy z to program. If successively called values of are close together (with identical values ′ ) and the corresponding value ), then you can save the state vector ( F,F of and c a,b, of z on each call, and use these as starting values for the next call. The incremental integration may then take only one or two steps. Avoid integrating across the branch cut unintentionally: the function value will be “ correct, but not the one you want. ” Alternatively, you may wish to integrate to some position z by a dog-leg path , as a means of 1 moving that does cross the real axis Re z> the branch cut. For / 2) (0 , 1 / 2) to (3 / 2 , 1 example, in some cases you might want to integrate from , with either sign of Im z> 1 — and go from there to any point with Re z . (If nding roots of a function by an iterative method, you do you are, for example, fi g of machine- not want the integration for nearby values to take different paths around a branch isit website point. If it does, your root- fi nder will see discontinuous function values, and will ica). likely not converge correctly!) In any case, be aware that a loss of numerical accuracy can result if you integrate through a region of large function value on your way to a fi nal answer where the function value is small. (For the hypergeometric function, a particular case of this is > a and b are both large and positive, with c and x when ll 1 .) In such cases, you ’ ∼ need to fi nd a better dog-leg path. The general technique of evaluating a function by integrating its differential equation in the complex plane can also be applied to other special functions. For

235 5.14 Evaluation of Functions by Path Integration 211 example, the complex Bessel function, Airy function, Coulomb wave function, and , with a uent hypergeometric function fl con Weber function are all special cases of the [1] differential equation similar to the one used above (see, e.g., 13.6, for a table of § nite z : fi uent hypergeometric function has no singularities at fl special cases). The con fi nity means That makes it easy to integrate. However, its essential singularity at in that it can have, along some paths and for some parameters, highly oscillatory or Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. exponentially decreasing behavior: That makes it hard to integrate. Some case by case judgment (or experimentation) is therefore required. CITED REFERENCES AND FURTHER READING: , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York). [1] g of machine- isit website ica).

236 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Chapter 6. Special Functions 6.0 Introduction There is nothing particularly special about a special function , except that some person in authority or textbook writer (not the same thing!) has decided to higher transcendental bestow the moniker. Special functions are sometimes called (higher than what?) or functions of mathematical physics (but they occur in functions other fields also) or functions that satisfy certain frequently occurring second-order differential equations (but not all special functions do). One might simply call them “useful functions” and let it go at that; it is surely only a matter of taste which functions we have chosen to include in this chapter. Good commercially available program libraries, such as NAG or IMSL, contain routines for a number of special functions. These routines are intended for users who will have no idea what goes on inside them. Such state of the art “black boxes” are often very messy things, full of branches to completely different methods depending on the value of the calling arguments. Black boxes have, or should have, careful control of accuracy, to some stated uniform precision in all regimes. We will not be quite so fastidious in our examples, in part because we want to illustrate techniques from Chapter 5, and in part because we want you to understand what goes on in the routines presented. Some of our routines have an accuracy parameter that can be made as small as desired, while others (especially those involving polynomial fits) give only a certain accuracy, one that we believe serviceable (typically six significant figures or more). We do not certify that the routines are perfect black boxes. We do hope that, if you ever encounter trouble in a routine, you will be able to diagnose and correct the problem on the basis of the information that we have given. In short, the special function routines of this chapter are meant to be used — we use them all the time — but we also want you to be prepared to understand g of machine- isit website their inner workings. ica). CITED REFERENCES AND FURTHER READING: Handbook of Mathematical Functions , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York) [full of useful numerical approximations to a great variety of functions]. IMSL Sfun/Library Users Manual (IMSL Inc., 2500 CityWest Boulevard, Houston TX 77042). NAG Fortran Library (Numerical Algorithms Group, 256 Banbury Road, Oxford OX27DE, U.K.), Chapter S. 212

237 6.1 Gamma, Beta, and Related Functions 213 Hart, J.F., et al. 1968, Computer Approximations (New York: Wiley). Hastings, C. 1955, Approximations for Digital Computers (Princeton: Princeton University Press). Luke, Y.L. 1975, Mathematical Functions and Their Approximations (New York: Academic Press). readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. 6.1 Gamma Function, Beta Function, Factorials, Binomial Coefficients The gamma function is defined by the integral ∫ ∞ z t − − 1 )= Γ( z t dt e ( 6.1.1 ) 0 When the argument is an integer, the gamma function is just the familiar factorial z function, but offset by one, n !=Γ( +1)( 6.1.2 ) n The gamma function satisfies the recurrence relation 6.1.3 ) Γ( z +1)= z Γ( z )( If the function is known for arguments or, more generally, in the half complex 1 z> ( z ) > 1 it can be obtained for z 1 or Re ( plane Re ) < 1 by the reflection formula z< π πz − z )= Γ(1 = ( 6.1.4 ) Γ( z Γ(1+ z )sin( πz ) )sin( πz ) Notice that Γ( z ) has a pole at z =0 , and at all negative integer values of z . ) Γ( There are a variety of methods in use for calculating the function z [1] numerically, but none is quite as neat as the approximation derived by Lanczos . This scheme is entirely specific to the gamma function, seemingly plucked from We will not attempt to derive the approximation, but only state the thin air. resulting formula: For certain integer choices of γ and N , and for certain coefficients c , the gamma function is given by ,...,c ,c N 2 1 1 1 γ + z ) + − z + ( 1 2 2 e ) + γ + +1)=( z Γ( z 2 ] [ ( 6.1.5 ) g of machine- √ c c c isit website 1 2 N + + + ( 0) + ··· z>  π 2 + c × 0 ica). N +2 z +1 z z + You can see that this is a sort of take-off on Stirling’s approximation, but with a series of corrections that take into account the first few poles in the left complex c plane. The constant is very nearly equal to 1. The error term is parametrized by  . 0 − 10 =5 , N =6 , and a certain set of c ’s, the error is smaller than |  | < 2 × 10 For γ . Impressed? If not, then perhaps you will be impressed by the fact that (with these same parameters) the formula (6.1.5) and bound on  apply for the complex gamma function, everywhere in the half complex plane Re z> 0 .

238 214 Special Functions Chapter 6. , since the latter will overflow many ) It is better to implement lnΓ( x ) than Γ( x x . Often the computers’ floating-point representation at quite modest values of Γ( are divided by gamma function is used in calculations where the large values of ) x other large numbers, with the result being a perfectly ordinary value. Such operations would normally be coded as subtraction of logarithms. With (6.1.5) in hand, we can compute the logarithm of the gamma function with two calls to a logarithm and 25 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer or so arithmetic operations. This makes it not much more difficult than other built-in x : sin or e x functions that we take for granted, such as #include float gammln(float xx) Returns the value ln[Γ( . 0 xx )] for xx > { Internal arithmetic will be done in double precision, a nicety that you can omit if five-figure accuracy is good enough. double x,y,tmp,ser; static double cof[6]={76.18009172947146,-86.50532032941677, 24.01409824083091,-1.231739572450155, 0.1208650973866179e-2,-0.5395239384953e-5}; int j; y=x=xx; tmp=x+5.5; tmp -= (x+0.5)*log(tmp); ser=1.000000000190015; for (j=0;j<=5;j++) ser += cof[j]/++y; return -tmp+log(2.5066282746310005*ser/x); } How shall we write a routine for the factorial function n ! ? Generally the factorial function will be called for small integer values (for large values it will overflow anyway!), and in most applications the same integer value will be called for many times. It is a profligate waste of computer time to call exp(gammln(n+1.0)) gammln in reserve for each required factorial. Better to go back to basics, holding for unlikely calls: #include float factrl(int n) as a floating-point number. n ! Returns the value { float gammln(float xx); void nrerror(char error_text[]); static int ntop=4; static float a[33]={1.0,1.0,2.0,6.0,24.0}; Fill in table only as required. int j; g of machine- isit website if (n < 0) nrerror("Negative factorial in routine factrl"); ica). if (n > 32) return exp(gammln(n+1.0)); Larger value than size of table is required. Actually, this big a value is going to overflow on many computers, but no harm in trying. while (ntop

239 6.1 Gamma, Beta, and Related Functions 215 n for the smaller values of , since exact A useful point is that factrl will be floating-point multiplies on small integers are exact on all computers. This exactness will not hold if we turn to the logarithm of the factorials. For binomial coefficients, however, we must do exactly this, since the individual factorials in a binomial coefficient will overflow long before the coefficient itself will. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) The binomial coefficient is defined by ( ) n n ! ( ) 6.1.6 n ≤ k ≤ 0 = )! n !( k k − k #include float bico(int n, int k) ) ( n as a floating-point number. Returns the binomial coefficient k { float factln(int n); return floor(0.5+exp(factln(n)-factln(k)-factln(n-k))); floor The . function cleans up roundoff error for smaller values of n and k } which uses float factln(int n) Returns ln( n !) . { float gammln(float xx); void nrerror(char error_text[]); static float a[101]; A static array is automatically initialized to zero. if (n < 0) nrerror("Negative factorial in routine factln"); if (n <= 1) return 0.0; In range of table. if (n <= 100) return a[n] ? a[n] : (a[n]=gammln(n+1.0)); Out of range of table. else return gammln(n+1.0); } If your problem requires a series of related binomial coefficients, a good idea is to use recurrence relations, for example ( ) ) ( ) ( ) ( n +1 n n n +1 n = + = k 1 n k +1 k k − k − g of machine- isit website ) 6.1.7 ( ( ) ) ( n n − k n ica). = +1 k +1 k k Finally, turning away from the combinatorial functions with integer valued arguments, we come to the beta function, ∫ 1 1 1 − z − w ( B z,w B ( w,z )= )= − t ) (1 t dt ( 6.1.8 ) 0

240 216 Chapter 6. Special Functions which is related to the gamma function by w ) Γ( z )Γ( )= B ( z,w 6.1.9 ) ( Γ( z + w ) hence readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin #include float beta(float z, float w) B ( z,w ) . Returns the value of the beta function { float gammln(float xx); return exp(gammln(z)+gammln(w)-gammln(z+w)); } CITED REFERENCES AND FURTHER READING: Handbook of Mathematical Functions , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 6. , ser. B, vol. 1, pp. 86–96. [1] Lanczos, C. 1964, SIAM Journal on Numerical Analysis 6.2 Incomplete Gamma Function, Error Function, Chi-Square Probability Function, Cumulative Poisson Function The incomplete gamma function is defined by ∫ x 1 γ ( a,x ) t 1 − − a ≡ t ) 0)( a> ( e dt 6.2.1 ≡ ) a,x ( P Γ( ) ) a Γ( a 0 It has the limiting values g of machine- P ( a, 0)=0 and a, ( P ∞ )=1( 6.2.2 ) isit website ica). The incomplete gamma function ( a,x ) is monotonic and (for a greater than one or P so) rises from “near-zero” to “near-unity” in a range of centered on about a − 1 , x √ and of width about a (see Figure 6.2.1). The complement of ) is also confusingly called an incomplete gamma ( a,x P function, ∫ ∞ ) a,x Γ( 1 − 1 − a t 1 ≡ ( P Q − ) ≡ ) a,x ( a,x ≡ 6.2.3 ) e a> ( t dt 0)( ) a Γ( ) Γ( a x

241 6.2 Incomplete Gamma Function 217 1.0 0.5 ) a,x 1.0 ( .8 P Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 3.0 = a .6 a = 10 .4 incomplete gamma function .2 0 02 4 6 8101214 x P ( a,x ) Figure 6.2.1. a . The incomplete gamma function for four values of It has the limiting values a, 0)=1 and Q ( ( ∞ )=0( 6.2.4 ) Q a, P ( a,x The notations ,γ ( a,x ) , and Γ( a,x ) are standard; the notation Q ( a,x ) is ) speci c to this book. fi γ a,x ) as follows: ( There is a series development for ∞ ∑ a ) Γ( − x n a a,x γ e ( )= x ( 6.2.5 ) x a +1+ n ) Γ( =0 n ; one rather for each a One does not actually need to compute a new Γ( n +1+ n ) cient. fi uses equation (6.1.3) and the previous coef ) A continued fraction development for Γ( a,x is ) ( 1 1 2 1 a − a 2 − x − a ) 6.2.6 ··· ( x> 0)( x Γ( a,x e )= g of machine- 1+ 1+ x x + + x + isit website ica). It is computationally better to use the even part of (6.2.6), which converges twice as fast (see 5.2): § ) ( 1 2 (2 ) − · a ) a − 1 · (1 − a x ··· ( x> 0) x e )= a,x Γ( x − − x +3 x +5 − a − a − a − +1 ( ) 6.2.7 It turns out that (6.2.5) converges rapidly for x less than about a +1 , while (6.2.6) or (6.2.7) converges rapidly for x greater than about a +1 . In these respective

242 218 Chapter 6. Special Functions √ regimes each requires at most a few times a terms to converge, and this many only near x = a , where the incomplete gamma functions are varying most rapidly. Thus (6.2.5) and (6.2.7) together allow evaluation of the function for all positive a and x . An extra dividend is that we never need compute a function value near zero by subtracting two nearly equal numbers. The higher-level functions that return ) a,x ( are Q P ( a,x ) and http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin float gammp(float a, float x) Returns the incomplete gamma function P a,x ) . ( { void gcf(float *gammcf, float a, float x, float *gln); void gser(float *gamser, float a, float x, float *gln); void nrerror(char error_text[]); float gamser,gammcf,gln; if (x < 0.0 || a <= 0.0) nrerror("Invalid arguments in routine gammp"); if (x < (a+1.0)) { Use the series representation. gser(&gamser,a,x,&gln); return gamser; } else { Use the continued fraction representation gcf(&gammcf,a,x,&gln); return 1.0-gammcf; and take its complement. } } float gammq(float a, float x) a,x . ) Returns the incomplete gamma function Q ( a,x ) ≡ 1 − P ( { void gcf(float *gammcf, float a, float x, float *gln); void gser(float *gamser, float a, float x, float *gln); void nrerror(char error_text[]); float gamser,gammcf,gln; if (x < 0.0 || a <= 0.0) nrerror("Invalid arguments in routine gammq"); Use the series representation if (x < (a+1.0)) { gser(&gamser,a,x,&gln); return 1.0-gamser; and take its complement. } else { Use the continued fraction representation. gcf(&gammcf,a,x,&gln); return gammcf; } } The argument gln is set by both the series and continued fraction procedures lnΓ( to the value a ) ; the reason for this is so that it is available to you if you want to ) , in addition to modify the above two procedures to give γ ( a,x ) and Γ( a,x ) a,x P ( g of machine- isit website and Q ( a,x ) (cf. equations 6.2.1 and 6.2.3). ica). The functions gcf which implement (6.2.5) and (6.2.7) are and gser #include #define ITMAX 100 #define EPS 3.0e-7 void gser(float *gamser, float a, float x, float *gln) gamser . ( a,x ) evaluated by its series representation as Returns the incomplete gamma function P Also returns lnΓ( a ) as gln . { float gammln(float xx);

243 6.2 Incomplete Gamma Function 219 void nrerror(char error_text[]); int n; float sum,del,ap; *gln=gammln(a); if (x <= 0.0) { if (x < 0.0) nrerror("x less than 0 in routine gser"); *gamser=0.0; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v return; } else { ap=a; del=sum=1.0/a; for (n=1;n<=ITMAX;n++) { ++ap; del *= x/ap; sum += del; if (fabs(del) < fabs(sum)*EPS) { *gamser=sum*exp(-x+a*log(x)-(*gln)); return; } } nrerror("a too large, ITMAX too small in routine gser"); return; } } #include #define ITMAX 100 Maximum allowed number of iterations. #define EPS 3.0e-7 Relative accuracy. #define FPMIN 1.0e-30 Number near the smallest representable floating-point number. void gcf(float *gammcf, float a, float x, float *gln) evaluated by its continued fraction represen- ) a,x Returns the incomplete gamma function Q ( tation as gln . Also returns lnΓ( a ) as gammcf . { float gammln(float xx); void nrerror(char error_text[]); int i; float an,b,c,d,del,h; *gln=gammln(a); Set up for evaluating continued fraction b=x+1.0-a; by modified Lentz’s method ( § 5.2) c=1.0/FPMIN; b with . d=1.0/b; =0 0 h=d; Iterate to convergence. for (i=1;i<=ITMAX;i++) { an = -i*(i-a); b += 2.0; d=an*d+b; g of machine- isit website if (fabs(d) < FPMIN) d=FPMIN; c=b+an/c; ica). if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; del=d*c; h *= del; if (fabs(del-1.0) < EPS) break; } if (i > ITMAX) nrerror("a too large, ITMAX too small in gcf"); *gammcf=exp(-x+a*log(x)-(*gln))*h; Put factors in front. }

244 220 Chapter 6. Special Functions Error Function The error function and complementary error function are special cases of the incomplete gamma function, and are obtained moderately ef fi ciently by the above fi procedures. Their de nitions are http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v ∫ x 2 2 − t √ e dt ( 6.2.8 ) erf )= x ( π 0 and ∫ ∞ 2 2 − t √ ) ( e dt 6.2.9 ) ≡ 1 erfc erf ( x )= ( x − π x The functions have the following limiting values and symmetries: x erf ( ∞ )=1 erf erf − (0)=0 )= − erf ( x ) (6.2.10) ( x erfc ) erfc (0)=1 erfc ( ∞ )=0 erfc ( − x )=2 − (6.2.11) ( They are related to the incomplete gamma functions by ( ) 1 2 erf ( x )= P ,x ( ) x ≥ 0)( 6.2.12 2 and ) ( 1 2 ,x ≥ 6.2.13 0)( x ( ) Q )= x ( erfc 2 icts with names already fl We ’ ll put an extra “ f ” into our routine names to avoid con in some libraries: C float erff(float x) Returns the error function erf ( x ) . { float gammp(float a, float x); return x < 0.0 ? -gammp(0.5,x*x) : gammp(0.5,x*x); } float erffc(float x) Returns the complementary error function erfc ( x ) . g of machine- { isit website float gammp(float a, float x); ica). float gammq(float a, float x); return x < 0.0 ? 1.0+gammp(0.5,x*x) : gammq(0.5,x*x); } If you care to do so, you can easily remedy the minor inef ciency in erff and fi √ π is computed unnecessarily when gammp or gammq , namely that Γ(0 . 5)= erffc is called. Before you do that, however, you might wish to consider the following routine, based on Chebyshev fi tting to an inspired guess as to the functional form:

245 6.2 Incomplete Gamma Function 221 #include float erfcc(float x) Returns the complementary error function erfc ( with fractional error everywhere less than ) x − 7 1 . 2 × 10 . { float t,z,ans; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v z=fabs(x); t=1.0/(1.0+0.5*z); ans=t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+ t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+ t*(-0.82215223+t*0.17087277))))))))); return x >= 0.0 ? ans : 2.0-ans; } two variables that are special cases of the There are also some functions of incomplete gamma function: Cumulative Poisson Probability Function P cumulative Poisson (

246 222 Chapter 6. Special Functions CITED REFERENCES AND FURTHER READING: , Applied Mathe- Handbook of Mathematical Functions Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapters 6, 7, and 26. Pearson, K. (ed.) 1951, Tables of the Incomplete Gamma Function (Cambridge: Cambridge University Press). http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 6.3 Exponential Integrals The standard definition of the exponential integral is ∫ ∞ − xt e )= ) 6.3.1 ,... ( ( x 1 , dt,x> 0 ,n =0 E n n t 1 The function defined by the principal value of the integral ∫ ∫ x ∞ − t t e e 6.3.2 ) dt = 0( dt,x> Ei( − )= x t t − x −∞ is also called an exponential integral. Note that Ei( − x ) is related to − E ( x ) by 1 analytic continuation. ) is a special case of the incomplete gamma function x ( E The function n − n 1 )= x Γ(1 ) x ( − n,x )( 6.3.3 E n We can therefore use a similar strategy for evaluating it. The continued fraction — 0 just equation (6.2.6) rewritten — converges for all : x> ) ( 1 2 +1 n n 1 − x 6.3.4 ··· ) ( e )= x ( E n + x 1+ 1+ x + x + We use it in its more rapidly converging even form, ) ( 1 · 2( n 1 +1) n x − ) 6.3.5 ( ··· E )= ( e x n − − + x x + n +4 − + x n n +2 > The continued fraction only really converges fast enough to be useful for x 1 . ∼ < g of machine- 1 , we can use the series representation For 0

247 6.3 Exponential Integrals 223 5772156649 is Euler’s constant. We evaluate the expression (6.3.6) ... . where γ =0 in order of ascending powers of x : [ ] n 2 − 2 1 − ( x x ) x + + −··· − E − )= x ( n 2)! ) · 1 n − (1 (3 − n )(1 · 2) ) − (2 − n n 1)( − ( readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin [ ] n +1 n 1 n − ) ( − x ( − x ) − ) x ( n − ln x ··· ψ + ( + )] − [ + + ! ( n − 1)! +1)! 1 · 2 · ( n n ( ) 6.3.8 n =1 . This method of evaluation has the The first square bracket is omitted when the series converges before reaching the term containing advantage that for large n ) n , . Accordingly, one needs an algorithm for evaluating ψ ( n ) only for small n ψ ( < 20 – 40. We use equation (6.3.7), although a table look-up would improve n ∼ efficiency slightly. [1] Amos presents a careful discussion of the truncation error in evaluating equation (6.3.8), and gives a fairly elaborate termination criterion. We have found that simply stopping when the last term added is smaller than the required tolerance works about as well. Two special cases have to be handled separately: x − e ( x )= E 0 x ( ) 6.3.9 1 1 ,n> E (0)= n n − 1 E The routine expint allows fast evaluation of to any accuracy ( x ) EPS n within the reach of your machine’s word length for floating-point numbers. The only modification required for increased accuracy is to supply Euler’s constant with [2] can provide you with the first 328 digits if enough significant digits. Wrench necessary! #include Maximum allowed number of iterations. #define MAXIT 100 #define EULER 0.5772156649 Euler’s constant γ . #define FPMIN 1.0e-30 Close to smallest representable floating-point number. #define EPS 1.0e-7 Desired relative error, not smaller than the machine pre- cision. float expint(int n, float x) Evaluates the exponential integral E ( x ) . n { g of machine- void nrerror(char error_text[]); isit website int i,ii,nm1; ica). float a,b,c,d,del,fact,h,psi,ans; nm1=n-1; if (n < 0 || x < 0.0 || (x==0.0 && (n==0 || n==1))) nrerror("bad arguments in expint"); else { if (n == 0) ans=exp(-x)/x; Special case. else { if (x == 0.0) ans=1.0/nm1; Another special case. else {

248 224 Chapter 6. Special Functions Lentz’s algorithm ( § 5.2). if (x > 1.0) { b=x+n; c=1.0/FPMIN; d=1.0/b; h=d; for (i=1;i<=MAXIT;i++) { a = -i*(nm1+i); b += 2.0; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Denominators cannot be zero. d=1.0/(a*d+b); c=b+a/c; del=c*d; h *= del; if (fabs(del-1.0) < EPS) { ans=h*exp(-x); return ans; } } nrerror("continued fraction failed in expint"); Evaluate series. } else { Setfirstterm. ans = (nm1!=0 ? 1.0/nm1 : -log(x)-EULER); fact=1.0; for (i=1;i<=MAXIT;i++) { fact *= -x/i; if (i != nm1) del = -fact/(i-nm1); else { psi = -EULER; Compute ψ n ) . ( for (ii=1;ii<=nm1;ii++) psi += 1.0/ii; del=fact*(-log(x)+psi); } ans += del; if (fabs(del) < fabs(ans)*EPS) return ans; } nrerror("series failed in expint"); } } } } return ans; } A good algorithm for evaluating x and is to use the power series for small Ei x . The power series is the asymptotic series for large 2 x x ( ) 6.3.10 + ··· + + x +ln γ )= x Ei( · 2! 1! 1 · 2 is Euler’s constant. The asymptotic expansion is γ where ) ( x g of machine- e 2! 1! isit website ) x Ei( ∼ + 6.3.11 ) ( ··· + 1+ 2 ica). x x x The lower limit for the use of the asymptotic expansion is approximately | ln EPS | , where EPS is the required relative error.

249 6.3 Exponential Integrals 225 #include γ . #define EULER 0.57721566 Euler’s constant #define MAXIT 100 Maximum number of iterations allowed. #define FPMIN 1.0e-30 Close to smallest representable floating-point number. Ei at #define EPS 6.0e-8 Relative error, or absolute error near the zero of . x . 3725 =0 float ei(float x) x> Computes the exponential integral Ei( x ) for 0 . http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. { void nrerror(char error_text[]); int k; float fact,prev,sum,term; if (x <= 0.0) nrerror("Bad argument in ei"); if (x < FPMIN) return log(x)+EULER; Special case: avoid failure of convergence test because of underflow. if (x <= -log(EPS)) { Use power series. sum=0.0; fact=1.0; for (k=1;k<=MAXIT;k++) { fact *= x/k; term=fact/k; sum += term; if (term < EPS*sum) break; } if (k > MAXIT) nrerror("Series failed in ei"); return sum+log(x)+EULER; } else { Use asymptotic series. Start with second term. sum=0.0; term=1.0; for (k=1;k<=MAXIT;k++) { prev=term; term *= k/x; if (term < EPS) break; Since final sum is greater than one, term itself approximates the relative error. Still converging: add new term. if (term < prev) sum += term; else { sum -= prev; Diverging: subtract previous term and break; exit. } } return exp(x)*(1.0+sum)/x; } } CITED REFERENCES AND FURTHER READING: Stegun, I.A., and Zucker, R. 1974, Journal of Research of the National Bureau of Standards , vol. 78B, pp. 199–216; 1976, op. cit. , vol. 80B, pp. 291–311. , vol. 6, pp. 365–377 [1]; also Amos D.E. 1980, ACM Transactions on Mathematical Software g of machine- isit website vol. 6, pp. 420–428. ica). Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 5. Wrench J.W. 1952, Mathematical Tables and Other Aids to Computation , vol. 6, p. 255. [2]

250 226 Special Functions Chapter 6. 1 (0.5,5.0) (8.0,10.0) .8 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ) a,b ( x I (1.0,3.0) .6 (0.5,0.5) .4 incomplete beta function .2 (5.0,0.5) 0 .4 .2 .6 1 .8 0 x Figure 6.4.1. I . Notice that the ( a, b The incomplete beta function for five different pairs of ( a, b ) ) x . 5) pairs (0 are symmetrically related as indicated in equation (6.4.3). 5 , 5 . 0) and (5 . 0 , 0 . 6.4 Incomplete Beta Function, Student’s Distribution, F-Distribution, Cumulative Binomial Distribution The incomplete beta function is defined by ∫ x 1 a,b ) ( B x 1 − a − 1 b I ≡ 0)( dt a,b> t ( a,b ) (1 − t ) ) 6.4.1 ≡ ( x a,b ) B ( B ( a,b ) 0 It has the limiting values g of machine- isit website ica). I a,b a,b )=0 I ) ( ( )=1( 6.4.2 1 0 and the symmetry relation I ) ( a,b )=1 − I 6.4.3 )( b,a ( x − x 1 a are both rather greater than one, then b If I and ( a,b ) rises from “near-zero” to x “near-unity” quite sharply at about x = a/ ( a + b ) . Figure 6.4.1 plots the function for several pairs ( a,b ) .

251 6.4 Incomplete Beta Function 227 The incomplete beta function has a series expansion [ ] ∞ a b ∑ − x ) ,n +1 ( B a +1) (1 x n +1 x 1+ ( )= a,b ) , 6.4.4 ( I x ) + b,n +1) B ( a,b ( aB a =0 n http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin but this does not prove to be very useful in its numerical evaluation. (Note, however, n with that the beta functions in the coefficients can be evaluated for each value of just the previous value and a few multiplies, using equations 6.1.9 and 6.1.3.) The continued fraction representation proves to be much more useful, ] [ a b d 1 d ) (1 − x x 1 2 I ( ) ··· 6.4.5 ( a,b )= x 1+ 1+ ) a,b ( aB 1+ where m x ) m + ( a + b )( a + d − = 2 +1 m +2 a m ( +1) +2 m )( a ) 6.4.6 ( m ( b − m ) x = d 2 m +2 − 1)( a m ( ) a +2 m ( This continued fraction converges rapidly for ( a +1) / x< a + b +2) , taking in √ the worst case O ( )) / max ( a,b we can iterations. But for x> ( a +1) +2) ( a + b just use the symmetry relation (6.4.3) to obtain an equivalent computation where the continued fraction will also converge rapidly. Hence we have #include float betai(float a, float b, float x) I Returns the incomplete beta function ( a . b ) , x { float betacf(float a, float b, float x); float gammln(float xx); void nrerror(char error_text[]); float bt; if (x < 0.0 || x > 1.0) nrerror("Bad x in routine betai"); if (x == 0.0 || x == 1.0) bt=0.0; else Factors in front of the continued fraction. bt=exp(gammln(a+b)-gammln(a)-gammln(b)+a*log(x)+b*log(1.0-x)); if (x < (a+1.0)/(a+b+2.0)) Use continued fraction directly. return bt*betacf(a,b,x)/a; else Use continued fraction after making the sym- metry transformation. return 1.0-bt*betacf(b,a,1.0-x)/b; } g of machine- isit website which utilizes the continued fraction evaluation routine ica). #include #define MAXIT 100 #define EPS 3.0e-7 #define FPMIN 1.0e-30 float betacf(float a, float b, float x) Used by betai : Evaluates continued fraction for incomplete beta function by modified Lentz’s method ( § 5.2). { void nrerror(char error_text[]);

252 228 Chapter 6. Special Functions int m,m2; float aa,c,d,del,h,qab,qam,qap; qab=a+b; These q ’s will be used in factors that occur in the coefficients (6.4.6). qap=a+1.0; qam=a-1.0; c=1.0; First step of Lentz’s method. d=1.0-qab*x/qap; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v if (fabs(d) < FPMIN) d=FPMIN; d=1.0/d; h=d; for (m=1;m<=MAXIT;m++) { m2=2*m; aa=m*(b-m)*x/((qam+m2)*(a+m2)); One step (the even one) of the recurrence. d=1.0+aa*d; if (fabs(d) < FPMIN) d=FPMIN; c=1.0+aa/c; if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; h *= d*c; aa = -(a+m)*(qab+m)*x/((a+m2)*(qap+m2)); d=1.0+aa*d; Next step of the recurrence (the odd one). if (fabs(d) < FPMIN) d=FPMIN; c=1.0+aa/c; if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; del=d*c; h *= del; if (fabs(del-1.0) < EPS) break; Are we done? } if (m > MAXIT) nrerror("a or b too big, or MAXIT too small in betacf"); return h; } Student’s Distribution Probability Function A ( t | ν ) Student’s distribution, denoted , is useful in several statistical contexts, notably in the test of whether two observed distributions have the same mean. ( t | ν ) A ν degrees of freedom, that a certain statistic t (measuring is the probability, for the observed difference of means) would be smaller than the observed value if the means were in fact the same. (See Chapter 14 for further details.) Two means are | ) is the significantly different if, e.g., A ( t | ν ) > 0 . 99 . In other words, 1 − A ( t ν significance level at which the hypothesis that the means are equal is disproved. The mathematical definition of the function is +1 ν ( ) ∫ − t 2 2 x 1 g of machine- isit website ( dx ) 6.4.7 1+ A ( t | ν )= ν 1 / 2 1 ν ν B ( ) , t − ica). 2 2 Limiting values are A (0 | ν )=0 A ( ∞| ν )=1( 6.4.8 ) A | ν ) is related to the incomplete beta function I ( t ( a,b ) by x ( ) ν 1 ν , − I A ( t | ν )=1 ( 6.4.9 ) 2 t + ν 2 2

253 6.4 Incomplete Beta Function 229 So, you can use (6.4.9) and the above routine betai to evaluate the function. F-Distribution Probability Function This function occurs in the statistical test of whether two observed samples Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. F , essentially the ratio of the observed have the same variance. A certain statistic dispersion of the first sample to that of the second one, is calculated. (For further F would be as large details, see Chapter 14.) The probability that as it is if the first sample’s underlying distribution actually has smaller variance than the second’s is Q ( F | ν denoted ,ν ) , where ν and ν are the number of degrees of freedom in the 1 2 1 2 first and second samples, respectively. In other words, Q ( F | ν ) is the significance ,ν 1 2 level at which the hypothesis “1 has smaller variance than 2” can be rejected. A small numerical value implies a very significant rejection, in turn implying high confidence in the hypothesis “1 has variance greater or equal to 2.” has the limiting values ,ν ) Q ( F | ν 2 1 6.4.10 Q )=0( ,ν ν ∞| ( ,ν ) )=1 | (0 Q ν 2 1 2 1 Its relation to the incomplete beta function I as evaluated by above is ( a,b ) betai x ( ) ν ν 1 2 ν , Q ( F | ν I ,ν )= ) 6.4.11 ( 2 2 1 2 2 + ν ν F 1 2 Cumulative Binomial Probability Distribution Suppose an event occurs with probability p per trial. Then the probability P of cumulative binomial probability , its occurring k or more times in n trials is termed a I and is related to the incomplete beta function ( a,b ) as follows: x ) ( n ∑ n j n − j ( 6.4.12 I − (1 − p ) ) k,n +1)( = k ≡ P p p j k j = For n larger than a dozen or so, betai is a much better way to evaluate the sum in g of machine- (6.4.12) than would be the straightforward sum with concurrent computation of the isit website smaller than a dozen, either method is acceptable.) binomial coefficients. (For n ica). CITED REFERENCES AND FURTHER READING: Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapters 6 and 26. Pearson, E., and Johnson, N. 1968, Tables of the Incomplete Beta Function (Cambridge: Cam- bridge University Press).

254 230 Special Functions Chapter 6. 6.5 Bessel Functions of Integer Order This section and the next one present practical algorithms for computing various kinds of Bessel functions of integer order. In § 6.7 we deal with fractional order. In fact, the more complicated routines for fractional order work fine for integer order Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. too. For integer order, however, the routines in this section (and 6.6) are simpler § and faster. Their only drawback is that they are limited by the precision of the underlying rational approximations. For full double precision, it is best to work with § 6.7. the routines for fractional order in ( x ) can be defined by the series , the Bessel function J For any real ν ν representation ) ( ∞ ν 1 k 2 ∑ − ) x ( 1 4 x ) ( 6.5.1 )= x ( J ν !Γ( ν + 2 +1) k k k =0 The series converges for all x , but it is not computationally very useful for x  1 . For ν not an integer the Bessel function Y ( ) is given by x ν x − x ( ( ) )cos( νπ ) J J ν − ν Y ) 6.5.2 ( ( )= x ν νπ sin( ) The right-hand side goes to the correct limiting value Y goes to some integer ( x ) as ν n , but this is also not computationally useful. n For arguments x<ν , both Bessel functions look qualitatively like simple 0 Y ν π 2 For x>ν , both Bessel functions look qualitatively like sine or cosine waves whose − 1 / 2 ν are  x . The asymptotic forms for x amplitude decays as √ ) ( 1 1 2 − x − νπ π cos ) x ( ∼ J ν πx 2 4 6.5.4 ( ) √ ) ( g of machine- 1 1 2 isit website Y x νπ − sin π − ∼ x ) ( ν ica). 4 πx 2 In the transition region where ν , the typical amplitudes of the Bessel functions x ∼ are on the order 3 / 1 1 4473 2 0 . J ) ( ν ∼ ∼ ν 2 3 / 1 3 1 / 3 / 2 ν ν Γ( 3 ) 3 6.5.5 ( ) 1 / 3 1 2 7748 . 0 ν ( ∼− Y ∼− ) ν 2 1 / 3 3 / 1 6 1 / ν ν Γ( 3 ) 3

255 6.5 Bessel Functions of Integer Order 231 1 J 0 J 1 J 2 J 3 .5 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin 0 .5 − Y 0 Y 1 Y 2 Bessel functions 1 − 1.5 − 2 − 246810 0 x Bessel functions J Figure 6.5.1. ( x ) through J . ( x ) and Y ) ( x ) through Y x ( 2 3 0 0 ν . Figure 6.5.1 plots the fi rst few Bessel which holds asymptotically for large functions of each kind. The Bessel functions satisfy the recurrence relations 2 n J x ( 6.5.6 x )= ) ( J )( − ( x ) J n − 1 +1 n n x and 2 n Y Y 6.5.7 ( x )= x ) ( )( Y ( x ) − n n 1 − +1 n x As already mentioned in 5.5, only the second of these (6.5.7) is stable in the § direction of increasing n for x

256 232 Chapter 6. Special Functions √ ) [ ) ( ( ] 8 8 2 X P cos( X sin( ) ( )= )+ Q x ( Y 6.5.10 ) n n n n n x x πx where +1 n 2 π ) 6.5.11 ( x ≡ − Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v X n 4 P and where ,P ,Q , and Q are each polynomials in their arguments, for 0 < 0 1 1 0 ’ s odd. Q 8 /x< 1 . The P ’ s are even polynomials, the cients of the various rational functions and polynomials are given by Coef fi [1] , for various levels of desired accuracy. A straightforward implementation is Hart #include float bessj0(float x) ) x for any real ( x . Returns the Bessel function J 0 { float ax,z; double xx,y,ans,ans1,ans2; Accumulate polynomials in double precision. if ((ax=fabs(x)) < 8.0) { Direct rational function fit. y=x*x; ans1=57568490574.0+y*(-13362590354.0+y*(651619640.7 +y*(-11214424.18+y*(77392.33017+y*(-184.9052456))))); ans2=57568490411.0+y*(1029532985.0+y*(9494680.718 +y*(59272.64853+y*(267.8532712+y*1.0)))); ans=ans1/ans2; } else { Fitting function (6.5.9). z=8.0/ax; y=z*z; xx=ax-0.785398164; ans1=1.0+y*(-0.1098628627e-2+y*(0.2734510407e-4 +y*(-0.2073370639e-5+y*0.2093887211e-6))); ans2 = -0.1562499995e-1+y*(0.1430488765e-3 +y*(-0.6911147651e-5+y*(0.7621095161e-6 -y*0.934945152e-7))); ans=sqrt(0.636619772/ax)*(cos(xx)*ans1-z*sin(xx)*ans2); } return ans; } #include float bessy0(float x) Returns the Bessel function Y . ( x ) for positive x 0 g of machine- { isit website float bessj0(float x); ica). float z; Accumulate polynomials in double precision. double xx,y,ans,ans1,ans2; if (x < 8.0) { Rational function approximation of (6.5.8). y=x*x; ans1 = -2957821389.0+y*(7062834065.0+y*(-512359803.6 +y*(10879881.29+y*(-86327.92757+y*228.4622733)))); ans2=40076544269.0+y*(745249964.8+y*(7189466.438 +y*(47447.26470+y*(226.1030244+y*1.0)))); ans=(ans1/ans2)+0.636619772*bessj0(x)*log(x); } else { Fitting function (6.5.10).

257 6.5 Bessel Functions of Integer Order 233 z=8.0/x; y=z*z; xx=x-0.785398164; ans1=1.0+y*(-0.1098628627e-2+y*(0.2734510407e-4 +y*(-0.2073370639e-5+y*0.2093887211e-6))); ans2 = -0.1562499995e-1+y*(0.1430488765e-3 +y*(-0.6911147651e-5+y*(0.7621095161e-6 +y*(-0.934945152e-7)))); readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ans=sqrt(0.636619772/x)*(sin(xx)*ans1+z*cos(xx)*ans2); } return ans; } #include float bessj1(float x) Returns the Bessel function J . ( x ) for any real x 1 { float ax,z; double xx,y,ans,ans1,ans2; Accumulate polynomials in double precision. if ((ax=fabs(x)) < 8.0) { Direct rational approximation. y=x*x; ans1=x*(72362614232.0+y*(-7895059235.0+y*(242396853.1 +y*(-2972611.439+y*(15704.48260+y*(-30.16036606)))))); ans2=144725228442.0+y*(2300535178.0+y*(18583304.74 +y*(99447.43394+y*(376.9991397+y*1.0)))); ans=ans1/ans2; } else { Fitting function (6.5.9). z=8.0/ax; y=z*z; xx=ax-2.356194491; ans1=1.0+y*(0.183105e-2+y*(-0.3516396496e-4 +y*(0.2457520174e-5+y*(-0.240337019e-6)))); ans2=0.04687499995+y*(-0.2002690873e-3 +y*(0.8449199096e-5+y*(-0.88228987e-6 +y*0.105787412e-6))); ans=sqrt(0.636619772/ax)*(cos(xx)*ans1-z*sin(xx)*ans2); if (x < 0.0) ans = -ans; } return ans; } #include float bessy1(float x) Returns the Bessel function Y ) ( x for positive x . 1 g of machine- { isit website float bessj1(float x); ica). float z; Accumulate polynomials in double precision. double xx,y,ans,ans1,ans2; if (x < 8.0) { Rational function approximation of (6.5.8). y=x*x; ans1=x*(-0.4900604943e13+y*(0.1275274390e13 +y*(-0.5153438139e11+y*(0.7349264551e9 +y*(-0.4237922726e7+y*0.8511937935e4))))); ans2=0.2499580570e14+y*(0.4244419664e12 +y*(0.3733650367e10+y*(0.2245904002e8 +y*(0.1020426050e6+y*(0.3549632885e3+y)))));

258 234 Special Functions Chapter 6. ans=(ans1/ans2)+0.636619772*(bessj1(x)*log(x)-1.0/x); } else { Fitting function (6.5.10). z=8.0/x; y=z*z; xx=x-2.356194491; ans1=1.0+y*(0.183105e-2+y*(-0.3516396496e-4 +y*(0.2457520174e-5+y*(-0.240337019e-6)))); ans2=0.04687499995+y*(-0.2002690873e-3 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v +y*(0.8449199096e-5+y*(-0.88228987e-6 +y*0.105787412e-6))); ans=sqrt(0.636619772/x)*(sin(xx)*ans1+z*cos(xx)*ans2); } return ans; } We now turn to the second task, namely how to use the recurrence formulas 2 . The latter ≥ n for ( x ) and Y ) ( x J (6.5.6) and (6.5.7) to get the Bessel functions n n of these is straightforward, since its upward recurrence is always stable: float bessy(int n, float x) Returns the Bessel function Y 2 ≥ . ( x ) for positive x and n n { float bessy0(float x); float bessy1(float x); void nrerror(char error_text[]); int j; float by,bym,byp,tox; if (n < 2) nrerror("Index n less than 2 in bessy"); tox=2.0/x; by=bessy1(x); Starting values for the recurrence. bym=bessy0(x); for (j=1;j

259 6.5 Bessel Functions of Integer Order 235 2 1 / an additive amount of order , where the square root of the constant constant × n ] [ is, very roughly, the number of signi fi cant fi gures of accuracy. The above considerations lead to the following function. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) #include #define ACC 40.0 Make larger to increase accuracy. #define BIGNO 1.0e10 #define BIGNI 1.0e-10 float bessj(int n, float x) Returns the Bessel function J . ( x ) for any real x and n ≥ 2 n { float bessj0(float x); float bessj1(float x); void nrerror(char error_text[]); int j,jsum,m; float ax,bj,bjm,bjp,sum,tox,ans; if (n < 2) nrerror("Index n less than 2 in bessj"); ax=fabs(x); if (ax == 0.0) return 0.0; J Upwards recurrence from else if (ax > (float) n) { . J and 0 1 tox=2.0/ax; bjm=bessj0(ax); bj=bessj1(ax); for (j=1;j0;j--) { bjm=j*tox*bj-bjp; bjp=bj; bj=bjm; if (fabs(bj) > BIGNO) { Renormalize to prevent overflows. bj *= BIGNI; bjp *= BIGNI; ans *= BIGNI; sum *= BIGNI; g of machine- isit website } if (jsum) sum += bj; Accumulate the sum. ica). jsum=!jsum; Change 0 to 1 or vice versa. if (j == n) ans=bjp; Save the unnormalized answer. } sum=2.0*sum-bj; Compute (5.5.16) ans /= sum; and use it to normalize the answer. } return x < 0.0 && (n & 1) ? -ans : ans; }

260 236 Chapter 6. Special Functions CITED REFERENCES AND FURTHER READING: Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 9. § 6.8, p. 141. [1] Hart, J.F., et al. 1968, Computer Approximations (New York: Wiley), Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 6.6 Modified Bessel Functions of Integer Order The modified Bessel functions I are equivalent to the usual ( x ) and K ) ( x n n Bessel functions J evaluated for purely imaginary arguments. In detail, and Y n n the relationship is n ix ( J ( x )=( − i ) ) I n n 6.6.1 ( ) π n +1 K ( x )] ( J ix )= ix )+ iY ( [ i n n n 2 J The particular choice of prefactor and of the linear combination of Y to form and n n K . are simply choices that make the functions real-valued for real arguments x n ) become, asymptotically, x ( x ) and K ( For small arguments , both I x n  n n simple powers of their argument ( ) n x 1 x ) ≈ ( 0 ≥ n I n 2 n ! ) ( 6.6.2 ≈− ln( ( ) x ) x K 0 ) ( − n x − ( 1)! n K ) ≈ n> 0 ( x n 2 2 J These expressions are virtually identical to those for ( x ) and Y in this region, ( ) x n n except for the factor of − 2 /π difference between Y ) x x ( ( . In the region ) and K n n , however, the modified functions have quite different behavior than the x  n Bessel functions, 1 √ ) x exp( x ) ≈ ( I n πx 2 ) 6.6.3 ( g of machine- π isit website √ − exp( ) x K x ( ) ≈ n ica). 2 πx The modified functions evidently have exponential rather than sinusoidal be- havior for large arguments (see Figure 6.6.1). The smoothness of the modified Bessel functions, once the exponential factor is removed, makes a simple polynomial , I , K , and K . I approximation of a few terms quite suitable for the functions 0 1 1 0 The following routines, based on polynomial coefficients given by Abramowitz and [1] , evaluate these four functions, and will provide the basis for upward Stegun recursion for n> 1 when x>n .

261 6.6 Modified Bessel Functions of Integer Order 237 4 3 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) I 0 2 K K K I 0 1 2 1 I 2 modified Bessel functions 1 I 3 0 01234 x x through ) Figure 6.6.1. Modi fi ed Bessel functions I ( ( x ) through I K ( x ) , K . ( x ) 0 0 3 2 #include float bessi0(float x) ( . for any real ) x x Returns the modified Bessel function I 0 { float ax,ans; double y; Accumulate polynomials in double precision. if ((ax=fabs(x)) < 3.75) { Polynomial fit. y=x/3.75; y*=y; ans=1.0+y*(3.5156229+y*(3.0899424+y*(1.2067492 +y*(0.2659732+y*(0.360768e-1+y*0.45813e-2))))); } else { y=3.75/ax; ans=(exp(ax)/sqrt(ax))*(0.39894228+y*(0.1328592e-1 +y*(0.225319e-2+y*(-0.157565e-2+y*(0.916281e-2 +y*(-0.2057706e-1+y*(0.2635537e-1+y*(-0.1647633e-1 +y*0.392377e-2)))))))); } g of machine- return ans; isit website } ica). #include float bessk0(float x) Returns the modified Bessel function K . ( x ) for positive real x 0 { float bessi0(float x); double y,ans; Accumulate polynomials in double precision.

262 238 Chapter 6. Special Functions if (x <= 2.0) { Polynomial fit. y=x*x/4.0; ans=(-log(x/2.0)*bessi0(x))+(-0.57721566+y*(0.42278420 +y*(0.23069756+y*(0.3488590e-1+y*(0.262698e-2 +y*(0.10750e-3+y*0.74e-5)))))); } else { y=2.0/x; ans=(exp(-x)/sqrt(x))*(1.25331414+y*(-0.7832358e-1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. +y*(0.2189568e-1+y*(-0.1062446e-1+y*(0.587872e-2 +y*(-0.251540e-2+y*0.53208e-3)))))); } return ans; } #include float bessi1(float x) Returns the modified Bessel function I ( x ) for any real x . 1 { float ax,ans; Accumulate polynomials in double precision. double y; Polynomial fit. if ((ax=fabs(x)) < 3.75) { y=x/3.75; y*=y; ans=ax*(0.5+y*(0.87890594+y*(0.51498869+y*(0.15084934 +y*(0.2658733e-1+y*(0.301532e-2+y*0.32411e-3)))))); } else { y=3.75/ax; ans=0.2282967e-1+y*(-0.2895312e-1+y*(0.1787654e-1 -y*0.420059e-2)); ans=0.39894228+y*(-0.3988024e-1+y*(-0.362018e-2 +y*(0.163801e-2+y*(-0.1031555e-1+y*ans)))); ans *= (exp(ax)/sqrt(ax)); } return x < 0.0 ? -ans : ans; } #include float bessk1(float x) Returns the modified Bessel function K . x x ( for positive real ) 1 { float bessi1(float x); double y,ans; Accumulate polynomials in double precision. if (x <= 2.0) { Polynomial fit. g of machine- y=x*x/4.0; isit website ans=(log(x/2.0)*bessi1(x))+(1.0/x)*(1.0+y*(0.15443144 ica). +y*(-0.67278579+y*(-0.18156897+y*(-0.1919402e-1 +y*(-0.110404e-2+y*(-0.4686e-4))))))); } else { y=2.0/x; ans=(exp(-x)/sqrt(x))*(1.25331414+y*(0.23498619 +y*(-0.3655620e-1+y*(0.1504268e-1+y*(-0.780353e-2 +y*(0.325614e-2+y*(-0.68245e-3))))))); } return ans; }

263 6.6 Modified Bessel Functions of Integer Order 239 ) The recurrence relation for I ( ( x ) and x J ( x ) is the same as that for K n n n and Y ix is substituted for x ) provided that ( . This has the effect of changing x n a sign in the relation, ( ) 2 n I x )= − ( ) x ( )+ ( x I I 1 n n n − +1 x Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 6.6.4 ( ( ) ) n 2 )=+ ) ( x x K ( ( x )+ K K 1 n +1 n n − x unstable for upward recurrence. For K These relations are always , itself growing, n this presents no problem. For I , however, the strategy of downward recursion is n therefore required once again, and the starting point for the recursion may be chosen in the same manner as for the routine bessj . The only fundamental difference is has an alternating minus sign in successive x ) ( I that the normalization formula for n ix for x terms, which again arises from the substitution of in the formula used previously for J n I ( x ) − 2 I ) x )+2 I 1= ( x ) − 2 I ( x )+ ··· ( 6.6.5 ( 6 2 4 0 In fact, we prefer simply to normalize with a call to bessi0 . With this simple modi fi cation, the recursion routines bessj and bessy become and : bessk the new routines bessi float bessk(int n, float x) K Returns the modified Bessel function for positive ( x ) 2 x and n ≥ . n { float bessk0(float x); float bessk1(float x); void nrerror(char error_text[]); int j; float bk,bkm,bkp,tox; if (n < 2) nrerror("Index n less than 2 in bessk"); tox=2.0/x; bkm=bessk0(x); Upward recurrence for all x ... bk=bessk1(x); ...and here it is. for (j=1;j #define ACC 40.0 Make larger to increase accuracy. #define BIGNO 1.0e10 #define BIGNI 1.0e-10 float bessi(int n, float x) Returns the modified Bessel function I . ( x ) for any real x and n ≥ 2 n { float bessi0(float x); void nrerror(char error_text[]);

264 240 Special Functions Chapter 6. int j; float bi,bim,bip,tox,ans; if (n < 2) nrerror("Index n less than 2 in bessi"); if (x == 0.0) return 0.0; else { tox=2.0/fabs(x); readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. bip=ans=0.0; bi=1.0; for (j=2*(n+(int) sqrt(ACC*n));j>0;j--) { Downward recurrence from even m bim=bip+j*tox*bi; . bip=bi; bi=bim; if (fabs(bi) > BIGNO) { Renormalize to prevent overflows. ans *= BIGNI; bi *= BIGNI; bip *= BIGNI; } if (j == n) ans=bip; } Normalize with ans *= bessi0(x)/bi; . bessi0 return x < 0.0 && (n & 1) ? -ans : ans; } } CITED REFERENCES AND FURTHER READING: Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), 9.8. [1] § Carrier, G.F., Krook, M. and Pearson, C.E. 1966, Functions of a Complex Variable (New York: McGraw-Hill), pp. 220ff. 6.7 Bessel Functions of Fractional Order, Airy Functions, Spherical Bessel Functions Many algorithms have been proposed for computing Bessel functions of fractional order numerically. Most of them are, in fact, not very good in practice. The routines given here are rather complicated, but they can be recommended wholeheartedly. Ordinary Bessel Functions [1] Steed’s method , which was originally developed The basic idea is for Coulomb wave g of machine- isit website ′ ′ J functions. The method calculates , J simultaneously, and so involves four Y , Y , and ν ν ν ν ica). relations among these functions. Three of the relations come from two continued fractions, one of which is complex. The fourth is provided by the Wronskian relation 2 ′ ′ Y = J − Y W J ≡ ) 6.7.1 ( ν ν ν ν πx The first continued fraction, CF1, is defined by ′ J J ν +1 ν ν − ≡ = f ν J x J ν ν 6.7.2 ( ) 1 ν 1 = − ··· − 2( x ν +2) /x − ν +1) /x 2(

265 6.7 Bessel Functions of Fractional Order 241 You can easily derive it from the three-term recurrence relation for Bessel functions: Start with equation (6.5.6) and use equation (5.5.18). Forward evaluation of the continued fraction by one of the methods of § 5.2 is essentially equivalent to backward recurrence of the recurrence relation. The rate of convergence of CF1 is determined by the position of the turning point √ < x , x = x ν ( ν +1) ≈ ν , beyond which the Bessel functions become oscillatory. If tp tp ∼ > convergence is very rapid. If x , then each iteration of the continued fraction effectively x tp ∼ < readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) by one until x ν increases x ; thereafter rapid convergence sets in. Thus the number tp ∼ x for large x . In the routine bessjy we set the maximum of iterations of CF1 is of order x , you can use the usual asymptotic allowed number of iterations to 10,000. For larger expressions for Bessel functions. One can show that the sign of J is the same as the sign of the denominator of CF1 ν once it has converged. The complex continued fraction CF2 is defined by 2 ′ ′ 2 2 2 J (1 / 2) i (3 2) / 1 iY ν − + ν − ν ν ≡ iq + p + ) ··· 6.7.3 + i ( = − iY i x 2( x x 2 J 2( x +2 i )+ + + )+ ν ν (We sketch the derivation of CF2 in the analogous case of modified Bessel functions in the > next subsection.) This continued fraction converges rapidly for x x , while convergence tp ∼ fails as x → 0 . We have to adopt a special method for small x , which we describe below. For ′ > x not too small, we can ensure that x downwards x and J by a stable recurrence of J tp ν ν ∼ < ν = μ to a value x , thus yielding the ratio f at this lower value of ν . This is the stable μ ∼ direction for the recurrence relation. The initial values for the recurrence are ′ 6.7.4 ) , J = arbitrary ,J ( f = J ν ν ν ν chosen to be the sign of the denominator of J with the sign of the arbitrary initial value of ν J CF1. Choosing the initial value of very small minimizes the possibility of overflow during ν the recurrence. The recurrence relations are ν ′ J + = J J ν − ν 1 ν x ) 6.7.5 ( − 1 ν ′ = J − J J ν − ν 1 ν − 1 x ν = μ , then with the Wronskian (6.7.1) we have enough Once CF2 has been evaluated at relations to solve for all four quantities. The formulas are simplified by introducing the quantity p − f μ ( 6.7.6 ) γ ≡ q Then ) ( 1 / 2 W ) ± 6.7.7 ( = J μ ) γ ( p − q + f μ ′ ( f J ) 6.7.8 = J μ μ μ = 6.7.9 ( ) γJ Y μ μ ) ( q ′ Y Y = ) ( 6.7.10 p + μ μ γ g of machine- isit website in (6.7.7) is chosen to be the same as the sign of the initial J in (6.7.4). J The sign of ν μ ica). Once all four functions have been determined at the value = μ , we can find them at the ν ′ .For J ν original value of and J , simply scale the values in (6.7.4) by the ratio of (6.7.7) to ν ν ′ the value found after applying the recurrence (6.7.5). The quantities Y can be found and Y ν ν by starting with the values in (6.7.9) and (6.7.10) and using the stable upwards recurrence 2 ν Y Y ) − Y = ( 6.7.11 ν ν − 1 ν +1 x together with the relation ν ′ ( = 6.7.12 Y ) − Y Y ν ν +1 ν x

266 242 Chapter 6. Special Functions [2] Now turn to the case of small x , when CF2 is not suitable. Temme has given a ′ Y good method of evaluating Y from (6.7.12), by series expansions , and hence and Y +1 ν ν ν that accurately handle the singularity as x → 0 . The expansions work only for | ν |≤ 1 / 2 , and so now the recurrence (6.7.5) is used to evaluate f = in this interval. ν μ at a value ν Then one calculates J from μ W J ) = 6.7.13 ( μ Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. ′ Y f − Y μ μ μ ′ from (6.7.8). The values at the original value of ν are determined by scaling as before, and J μ Y ’s are recurred up as before. and the Temme’s series are ∞ ∞ ∑ ∑ 2 − 6.7.14 ) c ( c h g = Y − = Y +1 ν ν k k k k x =0 k k =0 Here k 2 ( − x / 4) c = ) 6.7.15 ( k ! k f , and and , that can are defined in terms of quantities p q h while the coefficients g k k k k k be found by recursion: ) ( νπ 2 2 q g = + f sin k k k ν 2 h p = − kg + k k k p 1 − k p = k ) 6.7.16 ( ν − k q 1 k − = q k + ν k kf + p q + 1 k − 1 k − k − 1 = f k 2 2 − ν k The initial values for the recurrences are ( ) − ν x 1 ) Γ(1+ = ν p 0 2 π ( ) ν x 1 ) − ν = Γ(1 q 0 ) 6.7.17 ( 2 π ] ) ( [ 2 νπ sinh σ 2 )+ ν ( ( = ν ) f Γ ln cosh Γ σ 0 2 1 νπ sin π x σ with ) ( 2 ln σ = ν x ] [ 1 1 1 − ( ν )= Γ ) 6.7.18 ( 1 ν ν Γ(1 Γ(1+ 2 ) − ν ) ] [ 1 1 1 + )= ν ( Γ 2 ν 2 Γ(1 Γ(1+ − ) ν ) g of machine- isit website The whole point of writing the formulas in this way is that the potential problems as ν → 0 ica). can be controlled by evaluating νπ/ νπ , sinh σ/σ , and Γ sin carefully. In particular, Temme 1 Γ gives Chebyshev expansions for ( ν ) and Γ . We have rearranged his expansion for ( ν Γ ) 2 1 1 ν to be explicitly an even series in chebev as explained in § 5.8. so that we can use our routine The routine assumes ν ≥ 0 . For negative ν you can use the reflection formulas =cos νπJ − sin νπY J ν ν ν − ) 6.7.19 ( =sin +cos νπY Y νπJ ν ν ν − The routine also assumes x> 0 .For x< 0 the functions are in general complex, but expressible in terms of functions with x> 0 .For x =0 , Y is singular. ν

267 6.7 Bessel Functions of Fractional Order 243 Internal arithmetic in the routine is carried out in double precision. The complex arithmetic is carried out explicitly with real variables. #include #include "nrutil.h" #define EPS 1.0e-10 #define FPMIN 1.0e-30 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer #define MAXIT 10000 #define XMIN 2.0 #define PI 3.141592653589793 void bessjy(float x, float xnu, float *rj, float *ry, float *rjp, float *ryp) ′ ′ Returns the Bessel functions rj = J Y , ry = Y ,for and their derivatives rjp = J = ryp , ν ν ν ν . The relative accuracy is within one or two significant digits x and for xnu = ν ≥ 0 positive of controls its absolute accuracy. EPS EPS , except near a zero of one of the functions, where is a number close to the machine’s smallest floating-point number. All internal arithmetic FPMIN is in double precision. To convert the entire routine to double precision, change the float − 16 EPS and decrease to . 10 declarations above to double . Also convert the function beschb { void beschb(double x, double *gam1, double *gam2, double *gampl, double *gammi); int i,isign,l,nl; double a,b,br,bi,c,cr,ci,d,del,del1,den,di,dlr,dli,dr,e,f,fact,fact2, fact3,ff,gam,gam1,gam2,gammi,gampl,h,p,pimu,pimu2,q,r,rjl, rjl1,rjmu,rjp1,rjpl,rjtemp,ry1,rymu,rymup,rytemp,sum,sum1, temp,w,x2,xi,xi2,xmu,xmu2; if (x <= 0.0 || xnu < 0.0) nrerror("bad arguments in bessjy"); nl=(x < XMIN ? (int)(xnu+0.5) : IMAX(0,(int)(xnu-x+1.5))); is the number of downward recurrences of the J ’s and upward recurrences of Y ’s. xmu nl lies between − 1 / 2 and 1/2 for x < XMIN , while it is chosen so that x is greater than the . turning point for x ≥ XMIN xmu=xnu-nl; xmu2=xmu*xmu; xi=1.0/x; xi2=2.0*xi; w=xi2/PI; The Wronskian. § 5.2). isign=1; Evaluate CF1 by modified Lentz’s method ( isign h=xnu*xi; keeps track of sign changes in the de- nominator. if (h < FPMIN) h=FPMIN; b=xi2*xnu; d=0.0; c=h; for (i=1;i<=MAXIT;i++) { b += xi2; d=b-d; if (fabs(d) < FPMIN) d=FPMIN; c=b-1.0/c; if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; del=c*d; g of machine- h=del*h; isit website if (d < 0.0) isign = -isign; ica). if (fabs(del-1.0) < EPS) break; } if (i > MAXIT) nrerror("x too large in bessjy; try asymptotic expansion"); ′ rjl=isign*FPMIN; Initialize J J for downward recurrence. and ν ν rjpl=h*rjl; rjl1=rjl; Store values for later rescaling. rjp1=rjpl; fact=xnu*xi; for (l=nl;l>=1;l--) { rjtemp=fact*rjl+rjpl; fact -= xi;

268 244 Special Functions Chapter 6. rjpl=fact*rjtemp-rjl; rjl=rjtemp; } if (rjl == 0.0) rjl=EPS; ′ f=rjpl/rjl; Now have unnormalized J J and . μ μ Use series. if (x < XMIN) { x2=0.5*x; pimu=PI*xmu; Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v fact = (fabs(pimu) < EPS ? 1.0 : pimu/sin(pimu)); d = -log(x2); e=xmu*d; fact2 = (fabs(e) < EPS ? 1.0 : sinh(e)/e); beschb(xmu,&gam1,&gam2,&gampl,&gammi); Chebyshev evaluation of Γ and Γ . 1 2 . ff=2.0/PI*fact*(gam1*cosh(e)+gam2*fact2*d); f 0 e=exp(e); p=e/(gampl*PI); p . 0 . q=1.0/(e*PI*gammi); q 0 pimu2=0.5*pimu; fact3 = (fabs(pimu2) < EPS ? 1.0 : sin(pimu2)/pimu2); r=PI*pimu2*fact3*fact3; c=1.0; d = -x2*x2; sum=ff+r*q; sum1=p; for (i=1;i<=MAXIT;i++) { ff=(i*ff+p+q)/(i*i-xmu2); c *= (d/i); p /= (i-xmu); q /= (i+xmu); del=c*(ff+r*q); sum += del; del1=c*p-i*del; sum1 += del1; if (fabs(del) < (1.0+fabs(sum))*EPS) break; } if (i > MAXIT) nrerror("bessy series failed to converge"); rymu = -sum; ry1 = -sum1*xi2; rymup=xmu*xi*rymu-ry1; rjmu=w/(rymup-f*rymu); Equation (6.7.13). } else { § 5.2). Evaluate CF2 by modified Lentz’s method ( a=0.25-xmu2; p = -0.5*xi; q=1.0; br=2.0*x; bi=2.0; fact=a*xi/(p*p+q*q); cr=br+q*fact; ci=bi+p*fact; den=br*br+bi*bi; dr=br/den; g of machine- di = -bi/den; isit website dlr=cr*dr-ci*di; ica). dli=cr*di+ci*dr; temp=p*dlr-q*dli; q=p*dli+q*dlr; p=temp; for (i=2;i<=MAXIT;i++) { a += 2*(i-1); bi += 2.0; dr=a*dr+br; di=a*di+bi; if (fabs(dr)+fabs(di) < FPMIN) dr=FPMIN; fact=a/(cr*cr+ci*ci);

269 6.7 Bessel Functions of Fractional Order 245 cr=br+cr*fact; ci=bi-ci*fact; if (fabs(cr)+fabs(ci) < FPMIN) cr=FPMIN; den=dr*dr+di*di; dr /= den; di /= -den; dlr=cr*dr-ci*di; dli=cr*di+ci*dr; Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer temp=p*dlr-q*dli; q=p*dli+q*dlr; p=temp; if (fabs(dlr-1.0)+fabs(dli) < EPS) break; } if (i > MAXIT) nrerror("cf2 failed in bessjy"); gam=(p-f)/q; Equations (6.7.6) – (6.7.10). rjmu=sqrt(w/((p-f)*gam+q)); rjmu=SIGN(rjmu,rjl); rymu=rjmu*gam; rymup=rymu*(p+q/gam); ry1=xmu*xi*rymu-rymup; } fact=rjmu/rjl; ′ *rj=rjl1*fact; Scale original J J and . ν ν *rjp=rjp1*fact; . Upward recurrence of Y for (i=1;i<=nl;i++) { ν rytemp=(xmu+i)*xi2*ry1-rymu; rymu=ry1; ry1=rytemp; } *ry=rymu; *ryp=xnu*xi*rymu-ry1; } #define NUSE1 5 #define NUSE2 5 void beschb(double x, double *gam1, double *gam2, double *gampl, double *gammi) Γ Evaluates / 2 Γ ) by Chebyshev expansion for | x |≤ 1 and x . Also returns 1 / Γ(1+ and 1 2 / 1 Γ(1 − =7 . x ) . If converting to double precision, set NUSE1 =8 , NUSE2 { float chebev(float a, float b, float c[], int m, float x); float xx; static float c1[] = { -1.142022680371168e0,6.5165112670737e-3, 3.087090173086e-4,-3.4706269649e-6,6.9437664e-9, 3.67795e-11,-1.356e-13}; g of machine- static float c2[] = { isit website 1.843740587300905e0,-7.68528408447867e-2, ica). 1.2719271366546e-3,-4.9717367042e-6,-3.31261198e-8, 2.423096e-10,-1.702e-13,-1.49e-15}; xx=8.0*x*x-1.0; Multiply x by2tomakerangebe − 1 to 1, and then apply transformation for eval- *gam1=chebev(-1.0,1.0,c1,NUSE1,xx); uating even Chebyshev series. *gam2=chebev(-1.0,1.0,c2,NUSE2,xx); *gampl= *gam2-x*(*gam1); *gammi= *gam2+x*(*gam1); }

270 246 Chapter 6. Special Functions Modified Bessel Functions Steed’s method does not work for modified Bessel functions because in this case CF2 is [3] purely imaginary and we have only three relations among the four functions. Temme has given a normalization condition that provides the fourth relation. The Wronskian relation is Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 1 ′ ′ 6.7.20 ( ) − I − K K = ≡ I W ν ν ν ν x The continued fraction CF1 becomes ′ I ν 1 1 ν f = ≡ + ··· ( 6.7.21 ) ν + I 2( x +2) /x + 2( ν +1) /x ν ν To get CF2 and the normalization condition in a convenient form, consider the sequence of confluent hypergeometric functions 6.7.22 ( x )= U ( ν +1 / 2+ n, 2 ν +1 , 2 ) )( x z n ν . for fixed Then 1 / 2 ν − x x (2 x ) )= e π x z ) ( ( )( 6.7.23 K 0 ν ( [ ) ] z 1 1 1 K ) x ( ν 1 +1 2 = ν + x + + ν ) 6.7.24 ( − x ( 2 z x K 4 ) 0 ν K Equation (6.7.23) is the standard expression for in terms of a confluent hypergeometric ν function, while equation (6.7.24) follows from relations between contiguous confluent hy- pergeometric functions (equations 13.4.16 and 13.4.18 in Abramowitz and Stegun). Now z the functions satisfy the three-term recurrence relation (equation 13.4.15 in Abramowitz n and Stegun) ( ( 6.7.25 z a )+ x ( x )= b ) z z n n n +1 1 n +1 n − with b n + x ) =2( n 6.7.26 ( ) 2 2 n +1 / 2) = − ν − ] [( a +1 n Following the steps leading to equation (5.5.18), we get the continued fraction CF2 a z 1 1 2 = ( ··· ) 6.7.27 + + b z b 2 1 0 ′ /K /K K . and thus from which (6.7.24) gives K ν ν ν +1 ν Temme’s normalization condition is that ( ) ∞ +1 ν 2 / ∑ 1 C ) z 6.7.28 ( = n n x 2 =0 n where n ( − 1) +1 ) n Γ( 2+ ν / ) 6.7.29 ( C = n n Γ( ν +1 / 2 − n ) ! g of machine- isit website Note that the C ’s can be determined by recursion: n ica). a +1 n ,C ( ) = − 6.7.30 =1 C C +1 n n 0 +1 n We use the condition (6.7.28) by finding ∞ ∑ z n 6.7.31 ( C ) = S n z 0 n =1 Then ) ( ν +1 / 2 1 1 z ) 6.7.32 ( = 0 x 1+ S 2

271 6.7 Bessel Functions of Fractional Order 247 . K and (6.7.23) gives ν [4] Thompson and Barnett have given a clever method of doing the sum (6.7.31) simultaneously with the forward evaluation of the continued fraction CF2. Suppose the continued fraction is being evaluated as ∞ ∑ z 1 6.7.33 ( h = ) ∆ n z 0 =0 n readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer are being found by, e.g., Steed’s algorithm or the modified Lentz’s h ∆ where the increments n algorithm of 5.2. Then the approximation to S keeping the first N terms can be found as § N ∑ S = ) Q ∆ h 6.7.34 ( N n n n =1 Here n ∑ ( = ) C q 6.7.35 Q n k k =1 k is found by recursion from and q k 6.7.36 /a =( q ( b q ) ) − q k k k k − 1 k +1 +1 , q =1 . For the case at hand, approximately three times as many terms =0 starting with q 0 1 to converge as are needed simply for CF2 to converge. S are needed to get K To find and K for small x we use series analogous to (6.7.14): ν ν +1 ∞ ∞ ∑ ∑ 2 ) 6.7.37 c ( f = K h c = K +1 ν ν k k k k x =0 k k =0 Here k 2 ( x / 4) = c k k ! p = − kf + h k k k p k − 1 p = k 6.7.38 ( ) − ν k q − k 1 = q k ν k + kf q + p + 1 1 − − k k k − 1 = f k 2 2 k − ν The initial values for the recurrences are ( ) − ν x 1 = ) Γ(1+ ν p 0 2 2 ( ) ν x 1 ) ν = Γ(1 − q 0 ) 6.7.39 ( 2 2 [ ) ( ] 2 νπ sinh σ cosh ln Γ σ Γ ( = ν )+ ( ν ) f 2 1 0 νπ sin x σ , and CF2 and the normalization relation (6.7.28) require x Both the series for small = in this interval, find μ ν down to a value 2 . In both cases, therefore, we recurse ν / 1 | |≤ I ν g of machine- isit website K there, and recurse K back up to the original value of ν . μ ν ica). ν ≥ 0 . For negative ν use the reflection formulas The routine assumes 2 ) sin( νπ K = I + I − ν ν ν π ( 6.7.40 ) K = K − ν ν x − x ∼ ∼ e , K , and so these functions will overflow or e , Note that for large I x ν ν − x x underflow. It is often desirable to be able to compute the scaled quantities e . and e I K ν ν − x Simply omitting the factor e in equation (6.7.23) will ensure that all four quantities will have the appropriate scaling. If you also want to scale the four quantities for small x when x the series in equation (6.7.37) are used, you must multiply each series by e .

272 248 Chapter 6. Special Functions #include #define EPS 1.0e-10 #define FPMIN 1.0e-30 #define MAXIT 10000 #define XMIN 2.0 #define PI 3.141592653589793 void bessik(float x, float xnu, float *ri, float *rk, float *rip, float *rkp) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v ′ Returns the modified Bessel functions rip ri rk = K , and their derivatives I = I = , ν ν ν ′ = K . The relative accuracy is within one or two 0 ,forpositive x and for xnu = rkp ≥ ν ν significant digits of . FPMIN is a number close to the machine’s smallest floating-point EPS number. All internal arithmetic is in double precision. To convert the entire routine to double − 16 precision, change the double and decrease EPS to 10 .Also float declarations above to . beschb convert the function { void beschb(double x, double *gam1, double *gam2, double *gampl, double *gammi); void nrerror(char error_text[]); int i,l,nl; double a,a1,b,c,d,del,del1,delh,dels,e,f,fact,fact2,ff,gam1,gam2, gammi,gampl,h,p,pimu,q,q1,q2,qnew,ril,ril1,rimu,rip1,ripl, ritemp,rk1,rkmu,rkmup,rktemp,s,sum,sum1,x2,xi,xi2,xmu,xmu2; if (x <= 0.0 || xnu < 0.0) nrerror("bad arguments in bessik"); nl=(int)(xnu+0.5); nl is the number of downward re- currences of the I ’s and upward xmu=xnu-nl; K lies be- recurrences of xmu2=xmu*xmu; ’s. xmu − 2 and 1/2. tween xi=1.0/x; / 1 xi2=2.0*xi; h=xnu*xi; Evaluate CF1 by modified Lentz’s § 5.2). if (h < FPMIN) h=FPMIN; method ( b=xi2*xnu; d=0.0; c=h; for (i=1;i<=MAXIT;i++) { b += xi2; Denominators cannot be zero here, d=1.0/(b+d); so no need for special precau- c=b+1.0/c; tions. del=c*d; h=del*h; if (fabs(del-1.0) < EPS) break; } if (i > MAXIT) nrerror("x too large in bessik; try asymptotic expansion"); ′ Initialize I ril=FPMIN; and I for downward re- ν ν ripl=h*ril; currence. ril1=ril; Store values for later rescaling. rip1=ripl; fact=xnu*xi; for (l=nl;l>=1;l--) { ritemp=fact*ril+ripl; fact -= xi; g of machine- ripl=fact*ritemp+ril; isit website ril=ritemp; ica). } ′ I Now have unnormalized f=ripl/ril; I and . μ μ if (x < XMIN) { Use series. x2=0.5*x; pimu=PI*xmu; fact = (fabs(pimu) < EPS ? 1.0 : pimu/sin(pimu)); d = -log(x2); e=xmu*d; fact2 = (fabs(e) < EPS ? 1.0 : sinh(e)/e); beschb(xmu,&gam1,&gam2,&gampl,&gammi); Chebyshev evaluation of Γ . and Γ 2 1 . ff=fact*(gam1*cosh(e)+gam2*fact2*d); f 0

273 6.7 Bessel Functions of Fractional Order 249 sum=ff; e=exp(e); . p=0.5*e/gampl; p 0 q=0.5/(e*gammi); q . 0 c=1.0; d=x2*x2; sum1=p; for (i=1;i<=MAXIT;i++) { readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer ff=(i*ff+p+q)/(i*i-xmu2); c *= (d/i); p /= (i-xmu); q /= (i+xmu); del=c*ff; sum += del; del1=c*(p-i*ff); sum1 += del1; if (fabs(del) < fabs(sum)*EPS) break; } if (i > MAXIT) nrerror("bessk series failed to converge"); rkmu=sum; rk1=sum1*xi2; Evaluate CF2 by Steed’s algorithm } else { § 5.2), which is OK because there ( b=2.0*(1.0+x); can be no zero denominators. d=1.0/b; h=delh=d; q1=0.0; Initializations for recurrence (6.7.35). q2=1.0; a1=0.25-xmu2; q=c=a1; First term in equation (6.7.34). a = -a1; s=1.0+q*delh; for (i=2;i<=MAXIT;i++) { a -= 2*(i-1); c = -a*c/i; qnew=(q1-b*q2)/a; q1=q2; q2=qnew; q += c*qnew; b += 2.0; d=1.0/(b+a*d); delh=(b*d-1.0)*delh; h += delh; dels=q*delh; s += dels; if (fabs(dels/s) < EPS) break; Need only test convergence of sum since CF2 itself converges more quickly. } if (i > MAXIT) nrerror("bessik: failure to converge in cf2"); h=a1*h; exp( rkmu=sqrt(PI/(2.0*x))*exp(-x)/s; Omit the factor to scale − x ) all the returned functions by exp( x ) rk1=rkmu*(xmu+x+0.5-h)*xi; g of machine- for x ≥ XMIN } . isit website rkmup=xmu*xi*rkmu-rk1; ica). rimu=xi/(f*rkmu-rkmup); I Get from Wronskian. μ ′ *ri=(rimu*ril1)/ril; Scale original I and I . ν ν *rip=(rimu*rip1)/ril; . Upward recurrence of for (i=1;i<=nl;i++) { K ν rktemp=(xmu+i)*xi2*rk1+rkmu; rkmu=rk1; rk1=rktemp; } *rk=rkmu; *rkp=xnu*xi*rkmu-rk1; }

274 250 Special Functions Chapter 6. Airy Functions For positive x , the Airy functions are defined by √ 1 x K Ai( )= x ) 6.7.41 )( z ( 1 / 3 3 π √ Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin x [ I )= Bi( x ( ( z )+ I z )]( 6.7.42 ) / 3 / 1 3 − 1 3 where 2 2 3 / 6.7.43 ) ( z = x 3 By using the reflection formula (6.7.40), we can convert (6.7.42) into the computationally more useful form ] [ √ 1 2 √ K )= x Bi( ( 6.7.44 ) ( z )+ ) z x ( I / 1 3 / 1 3 π 3 Ai and Bi can be evaluated with a single call to bessik . so that The derivatives should not be evaluated by simply differentiating the above expressions because of possible subtraction errors near x =0 . Instead, use the equivalent expressions x ′ √ )= − ) x ( z ( Ai K / 3 2 π 3 ] [ ( 6.7.45 ) 1 2 ′ √ ( )= x ) z ( ( z )+ x I Bi K / 2 / 3 3 2 π 3 The corresponding formulas for negative arguments are [ ] √ 1 x √ J Ai( )= − x Y ( ( z ) − z ) / 3 / 1 3 1 2 3 [ ] √ 1 x √ )+ z z ( Y ) ( J )= x − Bi( − / 3 1 1 / 3 2 3 ( ) 6.7.46 ] [ x 1 ′ √ Y ( ( − x )= ( z )+ z ) Ai J 3 2 2 / 3 / 2 3 [ ] 1 x ′ √ Bi J x )= ( ( ) z ( − z ) − Y 2 2 3 3 / / 2 3 #include #define PI 3.1415927 #define THIRD (1.0/3.0) #define TWOTHR (2.0*THIRD) #define ONOVRT 0.57735027 void airy(float x, float *ai, float *bi, float *aip, float *bip) ′ ′ Ai( x ) Ai Bi( x ) , and their derivatives Returns Airy functions , , ( x ) ) Bi . ( x g of machine- { isit website void bessik(float x, float xnu, float *ri, float *rk, float *rip, ica). float *rkp); void bessjy(float x, float xnu, float *rj, float *ry, float *rjp, float *ryp); float absx,ri,rip,rj,rjp,rk,rkp,rootx,ry,ryp,z; absx=fabs(x); rootx=sqrt(absx); z=TWOTHR*absx*rootx; if (x > 0.0) { bessik(z,THIRD,&ri,&rk,&rip,&rkp); *ai=rootx*ONOVRT*rk/PI;

275 6.7 Bessel Functions of Fractional Order 251 *bi=rootx*(rk/PI+2.0*ONOVRT*ri); bessik(z,TWOTHR,&ri,&rk,&rip,&rkp); *aip = -x*ONOVRT*rk/PI; *bip=x*(rk/PI+2.0*ONOVRT*ri); } else if (x < 0.0) { bessjy(z,THIRD,&rj,&ry,&rjp,&ryp); *ai=0.5*rootx*(rj-ONOVRT*ry); *bi = -0.5*rootx*(ry+ONOVRT*rj); http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. bessjy(z,TWOTHR,&rj,&ry,&rjp,&ryp); *aip=0.5*absx*(ONOVRT*ry+rj); *bip=0.5*absx*(ONOVRT*rj-ry); x } else { Case . =0 *ai=0.35502805; *bi=(*ai)/ONOVRT; *aip = -0.25881940; *bip = -(*aip)/ONOVRT; } } Spherical Bessel Functions For integer n , spherical Bessel functions are defined by √ π J ( x )= ) ( x j n +(1 / 2) n x 2 ) 6.7.47 ( √ π y x ( ) )= ( x Y n / 2) n +(1 x 2 , and the derivatives can safely be found from bessjy They can be evaluated by a call to the derivatives of equation (6.7.47). Note that in the continued fraction CF2 in (6.7.3) just the first term survives for / 2 . =1 ν Thus one can make a very simple algorithm for spherical Bessel functions along the lines of by always recursing j bessjy and down to n =0 , setting p q from the first term in CF2, and n y then recursing is already up. No special series is required near x =0 . However, bessjy n so efficient that we have not bothered to provide an independent routine for spherical Bessels. #include #define RTPIO2 1.2533141 void sphbes(int n, float x, float *sj, float *sy, float *sjp, float *syp) ′ ′ j Returns spherical Bessel functions n ( for integer ) , y ) ( x ) , and their derivatives j . x ( x ) , y x ( n n n n { void bessjy(float x, float xnu, float *rj, float *ry, float *rjp, float *ryp); void nrerror(char error_text[]); g of machine- isit website float factor,order,rj,rjp,ry,ryp; ica). if (n < 0 || x <= 0.0) nrerror("bad arguments in sphbes"); order=n+0.5; bessjy(x,order,&rj,&ry,&rjp,&ryp); factor=RTPIO2/sqrt(x); *sj=factor*rj; *sy=factor*ry; *sjp=factor*rjp-(*sj)/(2.0*x); *syp=factor*ryp-(*sy)/(2.0*x); }

276 252 Special Functions Chapter 6. CITED REFERENCES AND FURTHER READING: Barnett, A.R., Feng, D.H., Steed, J.W., and Goldfarb, L.J.B. 1974, Computer Physics Commu- , vol. 8, pp. 377–395. [1] nications , Temme, N.M. 1976, Journal of Computational Physics , vol. 21, pp. 343–350 [2]; 1975, op. cit. vol. 19, pp. 324–337. [3] , vol. 47, pp. 245– Computer Physics Communications Thompson, I.J., and Barnett, A.R. 1987, Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. 257. [4] Barnett, A.R. 1981, , vol. 21, pp. 297–314. Computer Physics Communications Journal of Computational Physics , vol. 64, pp. 490–509. Thompson, I.J., and Barnett, A.R. 1986, Handbook of Mathematical Functions , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 10. 6.8 Spherical Harmonics Spherical harmonics occur in a large variety of physical problems, for ex- ample, whenever a wave equation, or Laplace’s equation, is solved by separa- ( θ,φ ) , tion of variables in spherical coordinates. The spherical harmonic Y lm − l ≤ m ≤ l, is a function of the two coordinates θ,φ on the surface of a sphere. The spherical harmonics are orthogonal for different and m , and they are l normalized so that their integrated square over the sphere is unity: ∫ ∫ π 1 2 ′ ′ ′ ′ 6.8.1 δ ) d (cos θ ) Y δ dφ )= ( *( θ,φ ) Y θ,φ ( lm m m m l l l 0 − 1 Here asterisk denotes complex conjugation. associated Legendre Mathematically, the spherical harmonics are related to polynomials by the equation √ − )! m ( l l 2 +1 m imφ ( ) 6.8.2 e ( ) θ,φ θ (cos )= Y P lm l ( l + m )! π 4 By using the relation g of machine- isit website ica). m Y 6.8.3 )( ( θ,φ )=( − 1) ) Y θ,φ *( lm l, − m we can always relate a spherical harmonic to an associated Legendre polynomial m ≥ 0 . With x ≡ cos θ , these are defined in terms of the ordinary Legendre with § § 5.5) by 4.5 and polynomials (cf. m d m 2 m/ 2 m 1) ) (1 − x )( ) 6.8.4 x )=( − x ( P ( P l l m dx

277 6.8 Spherical Harmonics 253 The first few associated Legendre polynomials, and their corresponding nor- malized spherical harmonics, are √ 1 0 P = ( x )= 1 Y 00 0 4 π √ 3 iφ 1 2 1 / 2 − (1 − x ) P θe ( x Y sin = − )= 11 1 π 8 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer √ 3 0 )= xY θ ( cos x P = 10 1 π 4 √ 2 15 1 iφ 2 2 2 )= 3(1 x ) Y P = − x ( sin θe 22 2 4 2 π √ 15 2 1 / 2 iφ 1 x θe ) ( x )= xY P = − − 3(1 − cos sin θ 21 2 π 8 √ 1 5 3 1 2 2 0 (3 P ( − 1) Y x = )= x − θ ( ) cos 20 2 2 π 4 2 2 ( 6.8.5 ) There are many bad ways to evaluate associated Legendre polynomials numer- ically. For example, there are explicit expressions, such as [ ( ) m x 1 − l + m )! ( 1) − ( +1) l + m m − l ( )( m m/ 2 2 x − (1 P )= x ( ) − 1 l m m m 1!( m +1) − l 2 )! !( 2 ] ( ) 2 1 − x m − 1)( m + l +1)( − + l +2) l m )( l − m ( −··· + m 2 +2) 2!( m +1)( ( 6.8.6 ) m − l [1] ) (1 where the polynomial continues up through the term in x − for . (See this and related formulas.) This is not a satisfactory method because evaluation of the polynomial involves delicate cancellations between successive terms, which alternate in sign. For large l , the individual terms in the polynomial become very much larger than their sum, and all accuracy is lost. In practice, (6.8.6) can be used only in single precision (32-bit) for up l to 6 or 8, and in double precision (64-bit) for l up to 15 or 18, depending on the precision required for the answer. A more robust computational procedure is therefore desirable, as follows: The associated Legendre functions satisfy numerous recurrence relations, tab- [1-2] l . These are recurrences on l alone, on m alone, and on both ulated in are unstable, and so m simultaneously. Most of the recurrences involving and m l is, however, stable dangerous for numerical work. The following recurrence on (compare 5.5.1): m m m P 1) − m ( + = x (2 l − 1) 6.8.7 P l ( ) − − P ) m l ( l l l − 2 − 1 g of machine- isit website It is useful because there is a closed-form expression for the starting value, ica). m m 2 m/ 2 P − (2 m ) 1)!!(1 − x =( ) − 1) ( 6.8.8 m (The notation !! denotes the product of all odd integers less than or equal to n .) n m l = m +1 , and setting P Using (6.8.7) with , we find =0 1 m − m m 6.8.9 ( = x (2 m ) P +1) P m +1 m Equations (6.8.8) and (6.8.9) provide the two starting values required for (6.8.7) for general l . The function that implements this is

278 254 Special Functions Chapter 6. #include float plgndr(int l, int m, float x) m P Computes the associated Legendre polynomial ( x are integers satisfying .Here m and l ) l − . 1 ≤ x ≤ 0 1 ≤ m ≤ l , while x lies in the range { void nrerror(char error_text[]); float fact,pll,pmm,pmmp1,somx2; Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer int i,ll; if (m < 0 || m > l || fabs(x) > 1.0) nrerror("Bad arguments in routine plgndr"); m pmm=1.0; Compute P . m if (m > 0) { somx2=sqrt((1.0-x)*(1.0+x)); fact=1.0; for (i=1;i<=m;i++) { pmm *= -fact*somx2; fact += 2.0; } } if (l == m) return pmm; m Compute P else { . m +1 pmmp1=x*(2*m+1)*pmm; if (l == (m+1)) return pmmp1; m Compute else { P , l>m +1 . l for (ll=m+2;ll<=l;ll++) { pll=(x*(2*ll-1)*pmmp1-(ll+m-1)*pmm)/(ll-m); pmm=pmmp1; pmmp1=pll; } return pll; } } } CITED REFERENCES AND FURTHER READING: Magnus, W., and Oberhettinger, F. 1949, Formulas and Theorems for the Functions of Mathe- matical Physics (New York: Chelsea), pp. 54ff. [1] Abramowitz, M., and Stegun, I.A. 1964, , Applied Mathe- Handbook of Mathematical Functions matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 8. [2] g of machine- isit website ica).

279 6.9 Fresnel Integrals, Cosine and Sine Integrals 255 6.9 Fresnel Integrals, Cosine and Sine Integrals Fresnel Integrals http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) The two Fresnel integrals are defined by ∫ ∫ x x ) ) ( ( π π 2 2 dt,S ( x )= dt ( 6.9.1 ) t t sin cos )= x ( C 2 2 0 0 The most convenient way of evaluating these functions to arbitrary precision is to use power series for small x and a continued fraction for large x . The series are ( ) ) ( 9 5 4 2 π π x x x C − x )= ( −··· + 2 2! 2 5 · 9 · 4! ) 6.9.2 ( ( ) ) ) ( ( 3 7 11 5 3 π x π π x x x )= S ( + − −··· 2 7 2 3 1! 11 · 5! · · 3! 2 and There is a complex continued fraction that yields both ( x ) S C ( x ) simul- taneously: √ i 1+ π C )+ iS ( x )= x ( − i ) erf (1 = z,z ) x ( 6.9.3 2 2 where ) ( 2 1 2 2 2 / 3 1 / 1 1 z √ e ··· = erfc z + z + z z + z + + z π ) ( ( 6.9.4 ) 2 z · 3 2 4 1 1 · √ = ··· 2 2 2 +5 z +1 2 z z +9 − 2 π − − 2 In the last line we have converted the “standard” form of the continued fraction to its “even” form (see § 5.2), which converges twice as fast. We must be careful not to evaluate the alternating series (6.9.2) at too large a value of x ; inspection of the terms shows that is a good point to switch over to the continued fraction. x =1 . 5 x Note that for large g of machine- isit website ica). ) ) ( ( 1 1 π π 1 1 2 2 ) ∼ C ( x ) ) x 6.9.5 ( ( ,S ∼ x sin + cos − x 2 πx 2 2 πx 2 Thus the precision of the routine frenel may be limited by the precision of the library routines for sine and cosine for large x .

280 256 Special Functions Chapter 6. #include #include "complex.h" #define EPS 6.0e-8 #define MAXIT 100 #define FPMIN 1.0e-30 #define XMIN 1.5 #define PI 3.1415927 #define PIBY2 (PI/2.0) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Here is the maximum number of iterations allowed; FPMIN EPS is the relative error; MAXIT is a number near the smallest representable floating-point number; is the dividing line XMIN between using the series and continued fraction. #define TRUE 1 #define ONE Complex(1.0,0.0) void frenel(float x, float *s, float *c) Computes the Fresnel integrals S ( x ) and C ( x ) for all real x . { void nrerror(char error_text[]); int k,n,odd; float a,ax,fact,pix2,sign,sum,sumc,sums,term,test; fcomplex b,cc,d,h,del,cs; ax=fabs(x); Special case: avoid failure of convergence if (ax < sqrt(FPMIN)) { test because of underflow. *s=0.0; *c=ax; } else if (ax <= XMIN) { Evaluate both series simultaneously. sum=sums=0.0; sumc=ax; sign=1.0; fact=PIBY2*ax*ax; odd=TRUE; term=ax; n=3; for (k=1;k<=MAXIT;k++) { term *= fact/k; sum += sign*term/n; test=fabs(sum)*EPS; if (odd) { sign = -sign; sums=sum; sum=sumc; } else { sumc=sum; sum=sums; } if (term < test) break; odd=!odd; n+=2; } if (k > MAXIT) nrerror("series failed in frenel"); g of machine- *s=sums; isit website *c=sumc; ica). } else { Evaluate continued fraction by modified Lentz’s method ( § 5.2). pix2=PI*ax*ax; b=Complex(1.0,-pix2); cc=Complex(1.0/FPMIN,0.0); d=h=Cdiv(ONE,b); n = -1; for (k=2;k<=MAXIT;k++) { n+=2; a = -n*(n+1); b=Cadd(b,Complex(4.0,0.0)); d=Cdiv(ONE,Cadd(RCmul(a,d),b)); Denominators cannot be zero.

281 6.9 Fresnel Integrals, Cosine and Sine Integrals 257 cc=Cadd(b,Cdiv(Complex(a,0.0),cc)); del=Cmul(cc,d); h=Cmul(h,del); if (fabs(del.r-1.0)+fabs(del.i) < EPS) break; } if (k > MAXIT) nrerror("cf failed in frenel"); h=Cmul(Complex(ax,-ax),h); cs=Cmul(Complex(0.5,0.5), Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Csub(ONE,Cmul(Complex(cos(0.5*pix2),sin(0.5*pix2)),h))); *c=cs.r; *s=cs.i; } Use antisymmetry. if (x < 0.0) { *c = -(*c); *s = -(*s); } } Cosine and Sine Integrals The cosine and sine integrals are defined by ∫ x t cos − 1 dt γ +ln Ci( )= + x x t 0 6.9.6 ( ) ∫ x t sin dt x Si( )= t 0 is Euler’s constant. We only need a way to calculate the Here 0 . 5772 ... ≈ γ functions for x> 0 , because Si( − x )= − Si( x ) , Ci( ) x )=Ci( x ) − iπ ( 6.9.7 − Once again we can evaluate these functions by a judicious combination of power series and complex continued fraction. The series are 3 5 x x − Si( x )= x + −··· · 3! 5! 5 · 3 ( ) ) 6.9.8 ( 2 4 x x x x + )= Ci( − γ +ln −··· + · 4! 4 · 2 2! The continued fraction for the exponential integral E ) is ( ix 1 )+ 2] ( ix )= − Ci( x π/ i [Si( x ) − E 1 ) ( g of machine- 1 1 1 2 2 isit website ix − = e ··· ) 6.9.9 ( ica). 1+ 1+ + ix ix + + ix ) ( 2 2 1 2 1 ix − e = ··· 1+ 5+ ix − − ix − 3+ ix The “even” form of the continued fraction is given in the last line and converges twice as fast for about the same amount of computation. A good crossover point from the alternating series to the continued fraction is x =2 in this case. As for the Fresnel integrals, for large x the precision may be limited by the precision of the sine and cosine routines.

282 258 Chapter 6. Special Functions #include #include "complex.h" x . ) #define EPS 6.0e-8 Relative error, or absolute error near a zero of Ci( #define EULER 0.57721566 Euler’s constant γ . #define MAXIT 100 Maximum number of iterations allowed. π/ 2 . #define PIBY2 1.5707963 #define FPMIN 1.0e-30 Close to smallest representable floating-point number. Dividing line between using the series and continued frac- #define TMIN 2.0 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v tion. #define TRUE 1 #define ONE Complex(1.0,0.0) void cisi(float x, float *ci, float *si) Computes the cosine and sine integrals Ci( x ) and Si( x ) . Ci(0) is returned as a large negative number and no error message is generated. For x< 0 the routine returns Ci( − x ) and you must iπ − yourself. supply the { void nrerror(char error_text[]); int i,k,odd; float a,err,fact,sign,sum,sumc,sums,t,term; fcomplex h,b,c,d,del; t=fabs(x); Special case. if (t == 0.0) { *si=0.0; *ci = -1.0/FPMIN; return; } if (t > TMIN) { Evaluate continued fraction by modified Lentz’s method ( § 5.2). b=Complex(1.0,t); c=Complex(1.0/FPMIN,0.0); d=h=Cdiv(ONE,b); for (i=2;i<=MAXIT;i++) { a = -(i-1)*(i-1); b=Cadd(b,Complex(2.0,0.0)); d=Cdiv(ONE,Cadd(RCmul(a,d),b)); Denominators cannot be zero. c=Cadd(b,Cdiv(Complex(a,0.0),c)); del=Cmul(c,d); h=Cmul(h,del); if (fabs(del.r-1.0)+fabs(del.i) < EPS) break; } if (i > MAXIT) nrerror("cf failed in cisi"); h=Cmul(Complex(cos(t),-sin(t)),h); *ci = -h.r; *si=PIBY2+h.i; Evaluate both series simultaneously. } else { if (t < sqrt(FPMIN)) { Special case: avoid failure of convergence test because of underflow. sumc=0.0; sums=t; } else { sum=sums=sumc=0.0; sign=fact=1.0; g of machine- odd=TRUE; isit website for (k=1;k<=MAXIT;k++) { ica). fact *= t/k; term=fact/k; sum += sign*term; err=term/fabs(sum); if (odd) { sign = -sign; sums=sum; sum=sumc; } else { sumc=sum; sum=sums;

283 6.10 Dawson’s Integral 259 } if (err < EPS) break; odd=!odd; } if (k > MAXIT) nrerror("maxits exceeded in cisi"); } *si=sums; *ci=sumc+log(t)+EULER; http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v } if (x < 0.0) *si = -(*si); } CITED REFERENCES AND FURTHER READING: Journal of Research of the National Bureau of Standards , Stegun, I.A., and Zucker, R. 1976, vol. 80B, pp. 291–311; 1981, op. cit. , vol. 86, pp. 661–686. Handbook of Mathematical Functions , Applied Mathe- Abramowitz, M., and Stegun, I.A. 1964, matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapters 5 and 7. 6.10 Dawson’s Integral ( x Dawson’s Integral is defined by ) F ∫ x 2 2 − x t e )= ( F x ( e 6.10.1 ) dt 0 The function can also be related to the complex error function by √ 2 π i − z e erfc 6.10.2 ) ( . )] [1 − iz ( − z F ( )= 2 [1] A remarkable approximation for , due to Rybicki F ( z ) ,is 2 ( − z ) − nh ∑ e 1 √ ) ( 6.10.3 ) = lim z ( F 0 → h n π odd n exponentially What makes equation (6.10.3) unusual is that its accuracy increases (and correspondingly quite rapid as h gets small, so that quite moderate values of h convergence of the series) give very accurate approximations. g of machine- isit website We will discuss the theory that leads to equation (6.10.3) later, in § 13.11, as ica). an interesting application of Fourier methods. Here we simply implement a routine x for real values of based on the formula. It is first convenient to shift the summation index to center it approximately on to be the even integer nearest to the maximum of the exponential term. Define n 0 ′ ′ x/h , and x n n h , x ≡ ≡ x − x − , and n , so that ≡ n 0 0 0 0 2 ′ ′ N h − n − ) x ( ∑ e 1 √ ) 6.10.4 ( , F ) x ≈ ( ′ n + n π 0 ′ = N n − ′ n odd

284 260 Special Functions Chapter 6. is N where the approximate equality is accurate when h is sufficiently small and sufficiently large. The computation of this formula can be greatly speeded up if we note that ′ ) ( n 2 ′ 2 ′ 2 ′ ′ ′ n h h ) 2 − ( − x x x ) − ( n − h e e e = . ( 6.10.5 ) e Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. The first factor is computed once, the second is an array of constants to be stored, and the third can be computed recursively, so that only two exponentials need be 2 ′ h − ( n ) by e evaluated. Advantage is also taken of the symmetry of the coefficients ′ breaking the summation up into positive and negative values of n separately. 4 In the following routine, the choices h =0 . are made. Because and N =11 of the symmetry of the summations and the restriction to odd values of n , the limits version is about on the for loops are 1 to 6. The accuracy of the result in this float − 7 ( . In order to maintain relative accuracy near x =0 , where F x ) vanishes, 2 10 × [2] the program branches to the evaluation of the power series ) F ( x , for for | x | < 0 . 2 . #include #include "nrutil.h" #define NMAX 6 #define H 0.4 #define A1 (2.0/3.0) #define A2 0.4 #define A3 (2.0/7.0) float dawson(float x) ∫ x 2 2 ( x )=exp( − x Returns Dawson’s integral F for any real ) exp( t ) dt . x 0 { int i,n0; float d1,d2,e1,e2,sum,x2,xp,xx,ans; static float c[NMAX+1]; . if we need to initialize, else static int init = 0; Flag is 0 1 if (init == 0) { init=1; for (i=1;i<=NMAX;i++) c[i]=exp(-SQR((2.0*i-1.0)*H)); } Use series expansion. if (fabs(x) < 0.2) { x2=x*x; ans=x*(1.0-A1*x2*(1.0-A2*x2*(1.0-A3*x2))); } else { Use sampling theorem representation. xx=fabs(x); n0=2*(int)(0.5*xx/H+0.5); xp=xx-n0*H; g of machine- e1=exp(2.0*xp*H); isit website e2=e1*e1; ica). d1=n0+1; d2=d1-2.0; sum=0.0; for (i=1;i<=NMAX;i++,d1+=2.0,d2-=2.0,e1*=e2) sum += c[i]*(e1/d1+1.0/(d2*e1)); √ Constant is 1 / ans=0.5641895835*SIGN(exp(-xp*xp),x)*sum; π . } return ans; }

285 6.11 Elliptic Integrals and Jacobian Elliptic Functions 261 [2,3] Other methods for computing Dawson’s integral are also known . CITED REFERENCES AND FURTHER READING: Rybicki, G.B. 1989, Computers in Physics , vol. 3, no. 2, pp. 85–87. [1] Mathematics of Computation Cody, W.J., Pociorek, K.A., and Thatcher, H.C. 1970, , vol. 24, http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin pp. 171–178. [2] McCabe, J.H. 1974, Mathematics of Computation , vol. 28, pp. 811–816. [3] 6.11 Elliptic Integrals and Jacobian Elliptic Functions Elliptic integrals occur in many applications, because any integral of the form ∫ ) 6.11.1 R ( t,s ) ( dt where R is a rational function of t and s , and s is the square root of a cubic or t , can be evaluated in terms of elliptic integrals. Standard quartic polynomial in [1] describe how to carry out the reduction, which was originally done references by Legendre. Legendre showed that only three basic elliptic integrals are required. The simplest of these is ∫ x dt √ ) ( 6.11.2 = I 1 ) ( a t + b b t )( a + + b a t )( a )( + b t y 2 2 1 4 1 3 3 4 2 [2] where we have written the quartic s in factored form. In standard integral tables , one of the limits of integration is always a zero of the quartic, while the other limit lies closer than the next zero, so that there is no singularity within the interval. To ] into subintervals, each of which either y,x [ , we simply break the interval I evaluate 1 begins or ends on a singularity. The tables, therefore, need only distinguish the eight cases in which each of the four zeros (ordered according to size) appears as the upper or lower limit of integration. In addition, when one of the b ’s in (6.11.2) tends to zero, the quartic reduces to a cubic, with the largest or smallest singularity moving ; this leads to eight more cases (actually just special cases of the first eight). ±∞ to The sixteen cases in total are then usually tabulated in terms of Legendre’s standard g of machine- elliptic integral of the 1st kind, which we will define below. By a change of the isit website t , the zeros of the quartic are mapped to standard locations variable of integration ica). on the real axis. Then only two dimensionless parameters are needed to tabulate Legendre’s integral. However, the symmetry of the original integral (6.11.2) under permutation of the roots is concealed in Legendre’s notation. We will get back to Legendre’s notation below. But first, here is a better way: [3] Carlson has given a new definition of a standard elliptic integral of the first kind, ∫ ∞ 1 dt √ ) 6.11.3 ( x,y,z )= ( R F 2 y x )( t + ( )( t + z ) t + 0

286 262 Chapter 6. Special Functions x , where , and z are nonnegative and at most one is zero. By standardizing the range of y integration, he retains permutation symmetry for the zeros. (Weierstrass’ canonical form also has this property.) Carlson first shows that when x or y is a zero of the quartic in (6.11.2), the integral I can be written in terms of R in a form that is symmetric under 1 F y three zeros. In the general case when neither nor x is a permutation of the remaining zero, two such R addition theorem functions can be combined into a single one by an , F leading to the fundamental formula Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 2 2 2 I 6.11.4 ( U ) )( ,U =2 R ,U F 1 13 14 12 where ) 6.11.5 )( y =( X − X x Y ( Y / + Y ) Y X X U j m j i ij i m k k 2 1 / 2 / 1 X ) x 6.11.6 =( a ( ,Y + =( a b + b ) y ) i i i i i i 4 . A short-cut in evaluating these expressions is and i,j,k,m is any permutation of 1 , 2 , 3 , 2 2 U − = U − a − ( a b b a b a )( b ) 1 4 2 1 4 2 3 3 13 12 ( 6.11.7 ) 2 2 = U b − ( a b ) a b )( a b a − − U 4 2 1 3 2 1 4 3 12 14 is thus manifestly The I ’s correspond to the three ways of pairing the four zeros, and U 1 symmetric under permutation of the zeros. Equation (6.11.4) therefore reproduces all sixteen cases when one limit is a zero, and also includes the cases when neither limit is a zero. Thus Carlson’s function allows arbitrary ranges of integration and arbitrary positions of the branch points of the integrand relative to the interval of integration. To handle elliptic integrals of the second and third kind, Carlson defines the standard integral of the third kind as ∫ ∞ 3 dt √ ) ( 6.11.8 R )= x,y,z,p ( J 2 p ) t ( t + x )( t + y )( ( + z ) t + 0 x , y , and z . The degenerate case when two arguments are equal which is symmetric in is denoted R ) ( x,y,z )= R 6.11.9 ( x,y,z,z )( J D replaces Legendre’s integral of the second x and y . The function R and is symmetric in D kind. The degenerate form of R is denoted F 6.11.10 )( ) x,y,y ( ( x,y )= R R F C It embraces logarithmic, inverse circular, and inverse hyperbolic functions. [4-7] gives integral tables in terms of the exponents of the linear factors of Carlson 1 1 3 1 the quartic in (6.11.1). For example, the integral where the exponents are ( ) − , − , , 2 2 2 2 can be expressed as a single integral in terms of R ; it accounts for 144 separate cases in D [2] Gradshteyn and Ryzhik ! [3-7] Refer to Carlson’s papers for some of the practical details in reducing elliptic integrals to his standard forms, such as handling complex conjugate zeros. [8] Turn now to the numerical evaluation of elliptic integrals. The traditional methods are Gauss or Landen transformations. Descending transformations decrease the modulus g of machine- isit website k of the Legendre integrals towards zero, increasing transformations increase it towards ica). unity. In these limits the functions have simple analytic expressions. While these methods converge quadratically and are quite satisfactory for integrals of the first and second kinds, they generally lead to loss of significant figures in certain regimes for integrals of the third [9,10] kind. Carlson’s algorithms , by contrast, provide a unified method for all three kinds with no significant cancellations. The key ingredient in these algorithms is the duplication theorem : R ) ( x,y,z )=2 R λ ( x + λ,y + λ,z + F F ) ( ) 6.11.11 ( + λ + y λ z + x λ , , R = F 4 4 4

287 6.11 Elliptic Integrals and Jacobian Elliptic Functions 263 where / / 2 1 1 2 2 / 1 yz ( ) ) +( 6.11.12 +( xz ) λ xy =( ) [11] . Equation This theorem can be proved by a simple change of variable of integration (6.11.11) is iterated until the arguments of R are nearly equal. For equal arguments we have F − 2 / 1 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) R x ( x,x,x ( 6.11.13 ) )= F When the arguments are close enough, the function is evaluated from a fixed Taylor expansion about (6.11.13) through fifth-order terms. While the iterative part of the algorithm is only 6 = 4096 for each iteration. 4 linearly convergent, the error ultimately decreases by a factor of Typically only two or three iterations are required, perhaps six or seven if the initial values of the arguments have huge ratios. We list the algorithm for R here, and refer you to F [9] Carlson’s paper for the other cases. n =0 , Stage 1: For , 2 ,... compute 1 / =( x ) + y z + 3 μ n n n n X /μ ( x =1 /μ ) ) ,Y /μ =1 − ( y z − ( ) ,Z − =1 n n n n n n n n n  =max( | X | | , | Y Z | ) | , n n n n  If < tol go to Stage 2; else compute n / 1 2 / 1 2 2 1 / +( ) =( ) z z ) y y x +( x λ n n n n n n n / 4 ) x λ + =( x z + λ =( ) / 4 ,y ,z 4 =( y / + λ ) n n +1 n n n +1 n n n n +1 and repeat this stage. Stage 2: Compute 2 = X Y Y X − Z Z = ,E E n n n n 2 n 3 n 1 2 2 / 1 3 1 1 − + + E − E =(1 ) E R E E ) / ( μ 2 3 F n 3 2 2 44 14 10 24 p in is negative, and the In some applications the argument or the argument y in R R C J Cauchy principal value of the integral is required. This is easily handled by using the formulas R x,y,z,p ( )= J x,y,z ) p − y ( / )] xz/y,pγ/y ( x,y,z,γ ) − 3 R ( ( R )+3 − γ [( R ) y F C J ( 6.11.14 ) where g of machine- isit website ica). ) x − y ( z − y )( + y γ ≡ 6.11.15 ) ( − p y is positive if p is negative, and ) ( 1 / 2 x ) 6.11.16 )( − x,y x )= R ( ( − y, y R C C y x − has a zero at some value of p< 0 , so (6.11.14) will give The Cauchy principal value of R J some loss of significant figures near the zero.

288 264 Special Functions Chapter 6. #include #include "nrutil.h" #define ERRTOL 0.08 #define TINY 1.5e-38 #define BIG 3.0e37 #define THIRD (1.0/3.0) #define C1 (1.0/24.0) #define C2 0.1 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) #define C3 (3.0/44.0) #define C4 (1.0/14.0) float rf(float x, float y, float z) Computes Carlson’s elliptic integral of the first kind, R z must be nonneg- ,and ( x,y,z ) . x , y F TINY must be at least 5 times the machine underflow limit, ative, and at most one can be zero. BIG at most one fifth the machine overflow limit. { float alamb,ave,delx,dely,delz,e2,e3,sqrtx,sqrty,sqrtz,xt,yt,zt; if (FMIN(FMIN(x,y),z) < 0.0 || FMIN(FMIN(x+y,x+z),y+z) < TINY || FMAX(FMAX(x,y),z) > BIG) nrerror("invalid arguments in rf"); xt=x; yt=y; zt=z; do { sqrtx=sqrt(xt); sqrty=sqrt(yt); sqrtz=sqrt(zt); alamb=sqrtx*(sqrty+sqrtz)+sqrty*sqrtz; xt=0.25*(xt+alamb); yt=0.25*(yt+alamb); zt=0.25*(zt+alamb); ave=THIRD*(xt+yt+zt); delx=(ave-xt)/ave; dely=(ave-yt)/ave; delz=(ave-zt)/ave; } while (FMAX(FMAX(fabs(delx),fabs(dely)),fabs(delz)) > ERRTOL); e2=delx*dely-delz*delz; e3=delx*dely*delz; return (1.0+(C1*e2-C2-C3*e3)*e2+C4*e3)/sqrt(ave); } A value of 0.08 for the error tolerance parameter is adequate for single precision (7 6 significant digits). Since the error scales as  , we see that 0.0025 will yield double precision n (16 significant digits) and require at most two or three more iterations. Since the coefficients of the sixth-order truncation error are different for the other elliptic functions, these values for the error tolerance should be changed to 0.04 and 0.0012 in the algorithm for R , and 0.05 and C R 0.0015 for . As well as being an algorithm in its own right for certain combinations and R J D g of machine- of elementary functions, the algorithm for R is used repeatedly in the computation of R . isit website J C The C implementations test the input arguments against two machine-dependent con- ica). stants, BIG , to ensure that there will be no underflow or overflow during the TINY and computation. We have chosen conservative values, corresponding to a machine minimum of 38 39 − 10 3 × 1 . 7 × 10 and a machine maximum of . You can always extend the range of admissible argument values by using the homogeneity relations (6.11.22), below. #include #include "nrutil.h" #define ERRTOL 0.05

289 6.11 Elliptic Integrals and Jacobian Elliptic Functions 265 #define TINY 1.0e-25 #define BIG 4.5e21 #define C1 (3.0/14.0) #define C2 (1.0/6.0) #define C3 (9.0/22.0) #define C4 (3.0/26.0) #define C5 (0.25*C3) #define C6 (1.5*C4) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) float rd(float x, float y, float z) Computes Carlson’s elliptic integral of the second kind, R must be non- ( and y x,y,z ) . x D negative, and at most one can be zero. z must be positive. must be at least twice the TINY negative 2/3 power of the machine overflow limit. times BIG must be at most 0 . 1 × ERRTOL the negative 2/3 power of the machine underflow limit. { float alamb,ave,delx,dely,delz,ea,eb,ec,ed,ee,fac,sqrtx,sqrty, sqrtz,sum,xt,yt,zt; if (FMIN(x,y) < 0.0 || FMIN(x+y,z) < TINY || FMAX(FMAX(x,y),z) > BIG) nrerror("invalid arguments in rd"); xt=x; yt=y; zt=z; sum=0.0; fac=1.0; do { sqrtx=sqrt(xt); sqrty=sqrt(yt); sqrtz=sqrt(zt); alamb=sqrtx*(sqrty+sqrtz)+sqrty*sqrtz; sum += fac/(sqrtz*(zt+alamb)); fac=0.25*fac; xt=0.25*(xt+alamb); yt=0.25*(yt+alamb); zt=0.25*(zt+alamb); ave=0.2*(xt+yt+3.0*zt); delx=(ave-xt)/ave; dely=(ave-yt)/ave; delz=(ave-zt)/ave; } while (FMAX(FMAX(fabs(delx),fabs(dely)),fabs(delz)) > ERRTOL); ea=delx*dely; eb=delz*delz; ec=ea-eb; ed=ea-6.0*eb; ee=ed+ec+ec; return 3.0*sum+fac*(1.0+ed*(-C1+C5*ed-C6*delz*ee) +delz*(C2*ee+delz*(-C3*ec+delz*C4*ea)))/(ave*sqrt(ave)); } g of machine- #include isit website #include "nrutil.h" ica). #define ERRTOL 0.05 #define TINY 2.5e-13 #define BIG 9.0e11 #define C1 (3.0/14.0) #define C2 (1.0/3.0) #define C3 (3.0/22.0) #define C4 (3.0/26.0) #define C5 (0.75*C3) #define C6 (1.5*C4) #define C7 (0.5*C2) #define C8 (C3+C3)

290 266 Special Functions Chapter 6. float rj(float x, float y, float z, float p) ,and must be y , x . z ) x,y,z,p ( Computes Carlson’s elliptic integral of the third kind, R J , the Cauchy principal 0 p< must be nonzero. If p nonnegative, and at most one can be zero. value is returned. TINY must be at least twice the cube root of the machine underflow limit, BIG at most one fifth the cube root of the machine overflow limit. { float rc(float x, float y); readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin float rf(float x, float y, float z); float a,alamb,alpha,ans,ave,b,beta,delp,delx,dely,delz,ea,eb,ec, ed,ee,fac,pt,rcx,rho,sqrtx,sqrty,sqrtz,sum,tau,xt,yt,zt; if (FMIN(FMIN(x,y),z) < 0.0 || FMIN(FMIN(FMIN(x+y,x+z),y+z),fabs(p)) < TINY || FMAX(FMAX(FMAX(x,y),z),fabs(p)) > BIG) nrerror("invalid arguments in rj"); sum=0.0; fac=1.0; if (p > 0.0) { xt=x; yt=y; zt=z; pt=p; } else { xt=FMIN(FMIN(x,y),z); zt=FMAX(FMAX(x,y),z); yt=x+y+z-xt-zt; a=1.0/(yt-p); b=a*(zt-yt)*(yt-xt); pt=yt+b; rho=xt*zt/yt; tau=p*pt/yt; rcx=rc(rho,tau); } do { sqrtx=sqrt(xt); sqrty=sqrt(yt); sqrtz=sqrt(zt); alamb=sqrtx*(sqrty+sqrtz)+sqrty*sqrtz; alpha=SQR(pt*(sqrtx+sqrty+sqrtz)+sqrtx*sqrty*sqrtz); beta=pt*SQR(pt+alamb); sum += fac*rc(alpha,beta); fac=0.25*fac; xt=0.25*(xt+alamb); yt=0.25*(yt+alamb); zt=0.25*(zt+alamb); pt=0.25*(pt+alamb); ave=0.2*(xt+yt+zt+pt+pt); delx=(ave-xt)/ave; dely=(ave-yt)/ave; delz=(ave-zt)/ave; delp=(ave-pt)/ave; g of machine- } while (FMAX(FMAX(FMAX(fabs(delx),fabs(dely)), isit website fabs(delz)),fabs(delp)) > ERRTOL); ica). ea=delx*(dely+delz)+dely*delz; eb=delx*dely*delz; ec=delp*delp; ed=ea-3.0*ec; ee=eb+2.0*delp*(ea-ec); ans=3.0*sum+fac*(1.0+ed*(-C1+C5*ed-C6*ee)+eb*(C7+delp*(-C8+delp*C4)) +delp*ea*(C2-delp*C3)-C2*delp*ec)/(ave*sqrt(ave)); if (p <= 0.0) ans=a*(b*ans+3.0*(rcx-rf(xt,yt,zt))); return ans; }

291 6.11 Elliptic Integrals and Jacobian Elliptic Functions 267 #include #include "nrutil.h" #define ERRTOL 0.04 #define TINY 1.69e-38 #define SQRTNY 1.3e-19 #define BIG 3.e37 #define TNBG (TINY*BIG) #define COMP1 (2.236/SQRTNY) Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. #define COMP2 (TNBG*TNBG/25.0) #define THIRD (1.0/3.0) #define C1 0.3 #define C2 (1.0/7.0) #define C3 0.375 #define C4 (9.0/22.0) float rc(float x, float y) Computes Carlson’s degenerate elliptic integral, R must ( x,y ) . x must be nonnegative and y C be nonzero. If y< 0 , the Cauchy principal value is returned. TINY must be at least 5 times the machine underflow limit, at most one fifth the machine maximum overflow limit. BIG { float alamb,ave,s,w,xt,yt; if (x < 0.0 || y == 0.0 || (x+fabs(y)) < TINY || (x+fabs(y)) > BIG || (y<-COMP1 && x > 0.0 && x < COMP2)) nrerror("invalid arguments in rc"); if (y > 0.0) { xt=x; yt=y; w=1.0; } else { xt=x-y; yt = -y; w=sqrt(x)/sqrt(xt); } do { alamb=2.0*sqrt(xt)*sqrt(yt)+yt; xt=0.25*(xt+alamb); yt=0.25*(yt+alamb); ave=THIRD*(xt+yt+yt); s=(yt-ave)/ave; } while (fabs(s) > ERRTOL); return w*(1.0+s*s*(C1+s*(C2+s*(C3+s*C4))))/sqrt(ave); } At times you may want to express your answer in Legendre’s notation. Alter- natively, you may be given results in that notation and need to compute their values with the programs given above. It is a simple matter to transform back and forth. The Legendre elliptic integral of the 1st kind is defined as ∫ φ dθ g of machine- √ ( ) 6.11.17 F ( φ,k ) ≡ isit website 2 2 0 1 sin − θ k ica). complete elliptic integral of the 1st kind is given by The ( k ) ≡ F ( π/ 2 ,k )( 6.11.18 ) K , R In terms of F 2 2 2 )=sin φR F ( φ,k φ, − k (cos sin φ, 1) 1 F 6.11.19 ( ) 2 )= K ( R k − 1 , , 1) (0 k F

292 268 Chapter 6. Special Functions The Legendre elliptic integral of the 2nd kind and the complete elliptic integral of are given by the 2nd kind ∫ φ √ 2 2 E ( φ,k ) ≡ k θdθ − sin 1 0 2 2 2 readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) =sin φR (cos − k φ, sin 1 φ, 1) F ) ( 6.11.20 2 3 2 2 2 1 − (cos sin φ, k − k φR sin 1 φ, 1) D 3 2 2 2 1 k k , 1) − (0 , 1 R 1) (0 , 1 − k − , E ( k ) ≡ E ( π/ 2 ,k )= R D F 3 Finally, the Legendre elliptic integral of the 3rd kind is ∫ φ dθ √ ) ≡ φ,n,k Π( 2 2 2 0 θ ) n 1 − (1+ sin sin θ k 6.11.21 ) ( 2 2 2 =sin φR 1) k (cos sin φ, φ, 1 − F 3 2 2 2 2 1 φ (cos n φ, 1 − k sin sin φR φ, 1 , 1+ n sin ) − J 3 [12] n is opposite that of Abramowitz and Stegun (Note that this sign convention for , and that their .) k sin α is our #include #include "nrutil.h" float ellf(float phi, float ak) F ( φ,k ) , evaluated using Carlson’s function R Legendre elliptic integral of the 1st kind .The F 2 argument ranges are 0 ≤ φ ≤ π/ k , 0 ≤ . sin φ ≤ 1 { float rf(float x, float y, float z); float s; s=sin(phi); return s*rf(SQR(cos(phi)),(1.0-s*ak)*(1.0+s*ak),1.0); } #include #include "nrutil.h" float elle(float phi, float ak) Legendre elliptic integral of the 2nd kind E ( φ,k ) , evaluated using Carlson’s functions R and D g of machine- isit website k . 1 . The argument ranges are 0 ≤ φ ≤ π/ 2 , 0 ≤ ≤ sin φ R F { ica). float rd(float x, float y, float z); float rf(float x, float y, float z); float cc,q,s; s=sin(phi); cc=SQR(cos(phi)); q=(1.0-s*ak)*(1.0+s*ak); return s*(rf(cc,q,1.0)-(SQR(s*ak))*rd(cc,q,1.0)/3.0); }

293 6.11 Elliptic Integrals and Jacobian Elliptic Functions 269 #include #include "nrutil.h" float ellpi(float phi, float en, float ak) Legendre elliptic integral of the 3rd kind Π( φ,n,k ) , evaluated using Carlson’s functions R and J . (Note that the sign convention on n is opposite that of Abramowitz and Stegun.) The R F 1 φ sin k ≤ ≤ . ranges of φ and k are 0 ≤ φ ≤ π/ 2 , 0 { Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) float rf(float x, float y, float z); float rj(float x, float y, float z, float p); float cc,enss,q,s; s=sin(phi); enss=en*s*s; cc=SQR(cos(phi)); q=(1.0-s*ak)*(1.0+s*ak); return s*(rf(cc,q,1.0)-enss*rj(cc,q,1.0,1.0+enss)/3.0); } 3 1 − − Carlson’s functions are homogeneous of degree ,so and 2 2 / − 1 2 R λx,λy,λz )= ( λ R ) ( x,y,z F F ( 6.11.22 ) − 3 / 2 R λ ) ( λx,λy,λz,λp )= R x,y,z,p ( J J Thus to express a Carlson function in Legendre’s notation, permute the first three arguments into ascending order, use homogeneity to scale the third argument to be 1, and then use equations (6.11.19)–(6.11.21). Jacobian Elliptic Functions The Jacobian elliptic function sn is defined as follows: instead of considering the elliptic integral u ( ) ) ≡ u = F ( φ,k )( 6.11.23 y,k inverse function consider the u,k )( φ = sn ( =sin y 6.11.24 ) Equivalently, ∫ sn g of machine- dy isit website √ u = ) ( 6.11.25 2 2 2 ica). − k − y y ) (1 )(1 0 When =0 , sn is just sin. The functions cn and dn are defined by the relations k 2 2 2 2 2 sn ,k + sn cn + dn =1( =1 6.11.26 ) 2 2 The routine given below actually takes m k − as an input parameter. =1 ≡ k c c It also computes all three functions sn, cn, and dn since computing all three is no [8] . harder than computing any one of them. For a description of the method, see

294 270 Special Functions Chapter 6. #include #define CA 0.0003 The accuracy is the square of CA . void sncndn(float uu, float emmc, float *sn, float *cn, float *dn) Returns the Jacobian elliptic functions sn ( u,k = ( uu .Here ) u,k , while ( ) ,cn u u,k , and dn ) c c c 2 . k = emmc c { float a,b,c,d,emc,u; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) float em[14],en[14]; int i,ii,l,bo; emc=emmc; u=uu; if (emc) { bo=(emc < 0.0); if (bo) { d=1.0-emc; emc /= -1.0/d; u *= (d=sqrt(d)); } a=1.0; *dn=1.0; for (i=1;i<=13;i++) { l=i; em[i]=a; en[i]=(emc=sqrt(emc)); c=0.5*(a+emc); if (fabs(a-emc) <= CA*a) break; emc *= a; a=c; } u*=c; *sn=sin(u); *cn=cos(u); if (*sn) { a=(*cn)/(*sn); c*=a; for (ii=l;ii>=1;ii--) { b=em[ii]; a*=c; c *= (*dn); *dn=(en[ii]+a)/(b+a); a=c/b; } a=1.0/sqrt(c*c+1.0); *sn=(*sn >= 0.0 ? a : -a); *cn=c*(*sn); } if (bo) { a=(*dn); *dn=(*cn); g of machine- *cn=a; isit website *sn /= d; ica). } } else { *cn=1.0/cosh(u); *dn=(*cn); *sn=tanh(u); } }

295 6.12 Hypergeometric Functions 271 CITED REFERENCES AND FURTHER READING: Erd ́ elyi, A., Magnus, W., Oberhettinger, F., and Tricomi, F.G. 1953, Higher Transcendental Functions , Vol. II, (New York: McGraw-Hill). [1] Gradshteyn, I.S., and Ryzhik, I.W. 1980, Table of Integrals, Series, and Products (New York: Academic Press). [2] Carlson, B.C. 1977, SIAM Journal on Mathematical Analysis , vol. 8, pp. 231–242. [3] Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Mathematics of Computation , vol. 51, op. cit. Carlson, B.C. 1987, , vol. 49, pp. 595–606 [4]; 1988, op. cit. op. cit. , vol. 56, pp. 267–280. pp. 267–280 [5]; 1989, , vol. 53, pp. 327–333 [6]; 1991, [7] Numerische Mathematik , vol. 7, pp. 78–90; 1965, op. cit. , vol. 7, pp. 353–354; Bulirsch, R. 1965, op. cit. , vol. 13, pp. 305–315. [8] 1969, Carlson, B.C. 1979, Numerische Mathematik , vol. 33, pp. 1–16. [9] Carlson, B.C., and Notis, E.M. 1981, ACM Transactions on Mathematical Software , vol. 7, pp. 398–403. [10] , vol. 9, p. 524–528. [11] Carlson, B.C. 1978, SIAM Journal on Mathematical Analysis Abramowitz, M., and Stegun, I.A. 1964, , Applied Mathe- Handbook of Mathematical Functions matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York), Chapter 17. [12] Mathews, J., and Walker, R.L. 1970, Mathematical Methods of Physics , 2nd ed. (Reading, MA: W.A. Benjamin/Addison-Wesley), pp. 78–79. 6.12 Hypergeometric Functions As was discussed in § 5.14, a fast, general routine for the the complex hyperge- ometric function , is difficult or impossible. The function is defined as F ) ( a,b,c ; z 1 2 the analytic continuation of the hypergeometric series, 2 b a +1) b ( z ( z +1) a ab + ··· + F )=1+ z ; a,b,c ( 1 2 c c ( c +1) 2! 1! j z ( a +1) ... ( a + j − 1) b a b +1) ... ( b + j − 1) ( ··· + + +1) ... ( c + j − 1) c ( j ! c ( 6.12.1 ) [1] z This series converges only within the unit circle | (see | < 1 ), but one’s interest in the function is not confined to this region. Section 5.14 discussed the method of evaluating this function by direct path integration in the complex plane. We here merely list the routines that result. g of machine- Implementation of the function hypgeo is straightforward, and is described by isit website comments in the program. The machinery associated with Chapter 16’s routine for ica). integrating differential equations, odeint , is only minimally intrusive, and need not odeint even be completely understood: use of requires one zeroed global variable, hypdrv . one function call, and a prescribed format for the derivative routine hypgeo will fail, of course, for values of z too close to the The function singularity at 1 . (If you need to approach this singularity, or the one at ∞ , use the [1] .) Away from z =1 , and for moderate of “linear transformation formulas” in § 15 . 3 values of a,b,c , it is often remarkable how few steps are required to integrate the equations. A half-dozen is typical.

296 272 Chapter 6. Special Functions #include #include "complex.h" #include "nrutil.h" #define EPS 1.0e-6 Accuracy parameter. hypdrv Communicates with . fcomplex aa,bb,cc,z0,dz; Used by . int kmax,kount; odeint readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer float *xp,**yp,dxsav; fcomplex hypgeo(fcomplex a, fcomplex b, fcomplex c, fcomplex z) Complex hypergeometric function for complex , by direct integration of the F z ,and a,b,c 2 1 hypergeometric equation in the complex plane. The branch cut is taken to lie along the real z> axis, Re 1 . { void bsstep(float y[], float dydx[], int nv, float *xx, float htry, float eps, float yscal[], float *hdid, float *hnext, void (*derivs)(float, float [], float [])); void hypdrv(float s, float yy[], float dyyds[]); void hypser(fcomplex a, fcomplex b, fcomplex c, fcomplex z, fcomplex *series, fcomplex *deriv); void odeint(float ystart[], int nvar, float x1, float x2, float eps, float h1, float hmin, int *nok, int *nbad, void (*derivs)(float, float [], float []), void (*rkqs)(float [], float [], int, float *, float, float, float [], float *, float *, void (*)(float, float [], float []))); int nbad,nok; fcomplex ans,y[3]; float *yy; kmax=0; if (z.r*z.r+z.i*z.i <= 0.25) { Use series... hypser(a,b,c,z,&ans,&y[2]); return ans; } else if (z.r < 0.0) z0=Complex(-0.5,0.0); ...or pick a starting point for the path integration. else if (z.r <= 1.0) z0=Complex(0.5,0.0); else z0=Complex(0.0,z.i >= 0.0 ? 0.5 : -0.5); Load the global variables to pass pa- aa=a; rameters “over the head” of odeint bb=b; to . hypdrv cc=c; dz=Csub(z,z0); hypser(aa,bb,cc,z0,&y[1],&y[2]); Get starting function and derivative. yy=vector(1,4); yy[1]=y[1].r; yy[2]=y[1].i; yy[3]=y[2].r; yy[4]=y[2].i; odeint(yy,4,0.0,1.0,EPS,0.1,0.0001,&nok,&nbad,hypdrv,bsstep); The arguments to odeint are the vector of independent variables, its length, the starting and ending values of the dependent variable, the accuracy parameter, an initial guess for g of machine- stepsize, a minimum stepsize, the (returned) number of good and bad steps taken, and the isit website names of the derivative routine and the (here Bulirsch-Stoer) stepping routine. ica). y[1]=Complex(yy[1],yy[2]); free_vector(yy,1,4); return y[1]; }

297 6.12 Hypergeometric Functions 273 #include "complex.h" #define ONE Complex(1.0,0.0) void hypser(fcomplex a, fcomplex b, fcomplex c, fcomplex z, fcomplex *series, fcomplex *deriv) F and its derivative, iterating to machine accuracy. For Returns the hypergeometric series 2 1 | / 1 |≤ z convergence is quite rapid. 2 { Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. void nrerror(char error_text[]); int n; fcomplex aa,bb,cc,fac,temp; deriv->r=0.0; deriv->i=0.0; fac=Complex(1.0,0.0); temp=fac; aa=a; bb=b; cc=c; for (n=1;n<=1000;n++) { fac=Cmul(fac,Cdiv(Cmul(aa,bb),cc)); deriv->r+=fac.r; deriv->i+=fac.i; fac=Cmul(fac,RCmul(1.0/n,z)); *series=Cadd(temp,fac); if (series->r == temp.r && series->i == temp.i) return; temp= *series; aa=Cadd(aa,ONE); bb=Cadd(bb,ONE); cc=Cadd(cc,ONE); } nrerror("convergence failure in hypser"); } #include "complex.h" #define ONE Complex(1.0,0.0) extern fcomplex aa,bb,cc,z0,dz; hypgeo Defined in . void hypdrv(float s, float yy[], float dyyds[]) Computes derivatives for the hypergeometric equation, see text equation (5.14.4). { fcomplex z,y[3],dyds[3]; y[1]=Complex(yy[1],yy[2]); y[2]=Complex(yy[3],yy[4]); z=Cadd(z0,RCmul(s,dz)); dyds[1]=Cmul(y[2],dz); dyds[2]=Cmul(Csub(Cmul(Cmul(aa,bb),y[1]),Cmul(Csub(cc, Cmul(Cadd(Cadd(aa,bb),ONE),z)),y[2])), g of machine- isit website Cdiv(dz,Cmul(z,Csub(ONE,z)))); dyyds[1]=dyds[1].r; ica). dyyds[2]=dyds[1].i; dyyds[3]=dyds[2].r; dyyds[4]=dyds[2].i; } CITED REFERENCES AND FURTHER READING: Abramowitz, M., and Stegun, I.A. 1964, Handbook of Mathematical Functions , Applied Mathe- matics Series, Volume 55 (Washington: National Bureau of Standards; reprinted 1968 by Dover Publications, New York). [1]

298 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Chapter 7. Random Numbers 7.0 Introduction It may seem perverse to use a computer, that most precise and deterministic of all machines conceived by the human mind, to produce “random” numbers. More than perverse, it may seem to be a conceptual impossibility. Any program, after all, will produce output that is entirely predictable, hence not truly “random.” Nevertheless, practical computer “random number generators” are in common use. We will leave it to philosophers of the computer age to resolve the paradox in [1] § 3.5 for discussion and references). One sometimes a deep way (see, e.g., Knuth pseudo-random random hears computer-generated sequences termed , while the word is reserved for the output of an intrinsically random physical process, like the elapsed time between clicks of a Geiger counter placed next to a sample of some radioactive element. We will not try to make such fine distinctions. A working, though imprecise, definition of randomness in the context of computer-generated sequences, is to say that the deterministic program that produces a random sequence should be different from, and — in all measurable respects — statistically uncorrelated with, the computer program that uses its output. In other words, any two different random number generators ought to produce statistically the same results when coupled to your particular applications program. If they don’t, then at least one of them is not (from your point of view) a good generator. The above definition may seem circular, comparing, as it does, one generator to another. However, there exists a body of random number generators which mutually do satisfy the definition over a very, very broad class of applications programs. And it is also found empirically that statistically identical results are obtained from random numbers produced by physical processes. So, because such generators are g of machine- known to exist, we can leave to the philosophers the problem of defining them. isit website A pragmatic point of view, then, is that randomness is in the eye of the beholder ica). (or programmer). What is random enough for one application may not be random enough for another. Still, one is not entirely adrift in a sea of incommensurable applications programs: There is a certain list of statistical tests, some sensible and some merely enshrined by history, which on the whole will do a very good job of ferreting out any correlations that are likely to be detected by an applications program (in this case, yours). Good random number generators ought to pass all of these tests; or at least the user had better be aware of any that they fail, so that he or she will be able to judge whether they are relevant to the case at hand. 274

299 7.1 Uniform Deviates 275 [1] As for references on this subject, the one to turn to first is Knuth . Then [2] [3-4] try treat topics . Only a few of the standard books on numerical methods relating to random numbers. CITED REFERENCES AND FURTHER READING: Seminumerical Algorithms Knuth, D.E. 1981, The Art of Computer Programming , 2nd ed., vol. 2 of readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer (Reading, MA: Addison-Wesley), Chapter 3, especially § 3.5. [1] A Guide to Simulation Bratley, P., Fox, B.L., and Schrage, E.L. 1983, (New York: Springer- Verlag). [2] Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), Chapter 11. [3] Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical Computations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 10. [4] 7.1 Uniform Deviates Uniform deviates are just random numbers that lie within a specified range (typically 0 to 1), with any one number in the range just as likely as any other. They are, in other words, what you probably think “random numbers” are. However, we want to distinguish uniform deviates from other sorts of random numbers, for example numbers drawn from a normal (Gaussian) distribution of specified mean and standard deviation. These other sorts of deviates are almost always generated by performing appropriate operations on one or more uniform deviates, as we will see in subsequent sections. So, a reliable source of random uniform deviates, the subject of this section, is an essential building block for any sort of stochastic modeling or Monte Carlo computer work. System-Supplied Random Number Generators Most C implementations have, lurking within, a pair of library routines for initializing, and then generating, “random numbers.” In ANSI C, the synopsis is: #include #define RAND_MAX ... void srand(unsigned seed); int rand(void); You initialize the random number generator by invoking srand(seed) with . Each initializing value will typically result in a different seed some arbitrary g of machine- random sequence, or a least a different starting point in some one enormously long isit website sequence. The same initializing value of seed will always return the same random ica). sequence, however. You obtain successive random numbers in the sequence by successive calls to rand() . That function returns an integer that is typically in the range 0 to the largest representable positive value of type int (inclusive). Usually, as in ANSI C, this largest value is available as RAND_MAX , but sometimes you have to figure it out for yourself. If you want a random float value between 0.0 (inclusive) and 1.0 (exclusive), you get it by an expression like

300 276 Random Numbers Chapter 7. x = rand()/(RAND_MAX+1.0); Now our first, and perhaps most important, lesson in this chapter is: be very, that resembles the one just described. suspicious of a system-supplied rand() very s were If all scientific papers whose results are in doubt because of bad rand() to disappear from library shelves, there would be a gap on each shelf about as http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin big as your fist. System-supplied rand() s are almost always linear congruential and 0 , each between ,... ,I ,I , which generate a sequence of integers generators I 2 1 3 − (e.g., RAND_MAX ) by the recurrence relation m 1 I aI = + c (mod m )( 7.1.1 ) j +1 j Here m is called the modulus , and a and c are positive integers called the multiplier and the respectively. The recurrence (7.1.1) will eventually repeat itself, increment are properly chosen, with a period that is obviously no greater than m,a, and c .If m m . In that case, all possible then the period will be of maximal length, i.e., of length m 1 − integers between 0 and occur at some point, so any initial “seed” choice of I 0 is as good as any other: the sequence just takes off from that point. Although this general framework is powerful enough to provide quite decent random numbers, its implementation in many, if not most, ANSI C libraries is quite flawed; quite a number of implementations are in the category “totally botched.” Blame should be apportioned about equally between the ANSI C committee and the implementors. The typical problems are these: First, since the ANSI standard — which is only a two-byte quantity specifies that rand() return a value of type int on many machines — not very large. The ANSI C standard RAND_MAX is often requires only that it be at least 32767. This can be disastrous in many circumstances: 6 7.6 and § 7.8), you might well want to evaluate § for a Monte Carlo integration ( 10 different points, but actually be evaluating the same 32767 points 30 times each, not at all the same thing! You should categorically reject any library random number routine with a two-byte returned value. Second, the ANSI committee’s published rationale includes the following mischievous passage: “The committee decided that an implementation should be rand function which generates the best random sequence allowed to provide a possible in that implementation, and therefore mandated no standard algorithm. It recognized the value, however, of being able to generate the same pseudo-random sequence in different implementations, and so it has published an example ... . [emphasis added]” The “example” is unsigned long next=1; g of machine- isit website int rand(void) /* NOT RECOMMENDED (see text) */ ica). { next = next*1103515245 + 12345; return (unsigned int)(next/65536) % 32768; } void srand(unsigned int seed) { next=seed; }

301 7.1 Uniform Deviates 277 = 12345 This corresponds to equation (7.1.1) with = 1103515245 , c a , and 32 m =2 (since arithmetic done on unsigned long quantities is guaranteed to return the correct low-order bits). These are not particularly good choices for a and 30 ), though they are not gross embarrassments by themselves. c (the period is only 2 The real botches occur when implementors, taking the committee’s statement above as license, try to “improve” on the published example. For example, one popular readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin long 32-bit PC-compatible compiler provides a generator that uses the above congruence, but swaps the high-order and low-order 16 bits of the returned value. Somebody probably thought that this extra flourish added randomness; in fact it ruins the generator. While these kinds of blunders can, of course, be fixed, there remains a fundamental flaw in simple linear congruential generators, which we now discuss. The linear congruential method has the advantage of being very fast, requiring only a few operations per call, hence its almost universal use. It has the disadvantage that it is not free of sequential correlation on successive calls. If k random numbers at a time are used to plot points in k dimensional space (with each coordinate between k -dimensional space, but rather 0 and 1), then the points will not tend to “fill up” the 1 /k such about -dimensional “planes.” There will be at most k m ( − 1) will lie on many c m planes. If the constants are not very carefully chosen, there will be , a , and m is as bad as 32768, then the number of planes on which triples If fewer than that. of points lie in three-dimensional space will be no greater than about the cube root of 32768, or 32. Even if m is close to the machine’s largest representable integer, 32 , the number of planes on which triples of points lie in three-dimensional 2 e.g., ∼ 32 , about 1600. You might space is usually no greater than about the cube root of 2 well be focusing attention on a physical process that occurs in a small fraction of the total volume, so that the discreteness of the planes can be very pronounced. have Even worse, you might be using a generator whose choices of m,a, and c 31 , been botched. One infamous such routine, RANDU , with a = 65539 and m =2 was widespread on IBM mainframe computers for many years, and widely copied [1] . One of us recalls producing a “random” plot with only 11 onto other systems planes, and being told by his computer center’s programming consultant that he had misused the random number generator: “We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random.” Figure that out. Correlation in k -space is not the only weakness of linear congruential generators. Such generators often have their low-order (least significant) bits much less random than their high-order bits. If you want to generate a random integer between 1 and 10, you should always do it using high-order bits, as in g of machine- isit website j=1+(int) (10.0*rand()/(RAND_MAX+1.0)); ica). and never by anything resembling j=1+(rand() % 10); (which uses lower-order bits). Similarly you should never try to take apart a “ rand() ” number into several supposedly random pieces. Instead use separate calls for every piece.

302 278 Chapter 7. Random Numbers Portable Random Number Generators [1] Park and Miller have surveyed a large number of random number generators that have been used over the last 30 years or more. Along with a good theoretical review, they present an anecdotal sampling of a number of inadequate generators that Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v have come into widespread use. The historical record is nothing if not appalling. There is good evidence, both theoretical and empirical, that the simple multi- plicative congruential algorithm ) = aI 7.1.2 (mod m )( I j +1 j can be as good as any of the more general linear congruential generators that have =0 (equation 7.1.1) — if c a and modulus m are chosen exquisitely  the multiplier carefully. Park and Miller propose a “Minimal Standard” generator based on the choices 5 31 m =2 − = 16807 1 = 2147483647 ( 7.1.3 ) a =7 First proposed by Lewis, Goodman, and Miller in 1969, this generator has in subsequent years passed all new theoretical tests, and (perhaps more importantly) has accumulated a large amount of successful use. Park and Miller do not claim that the generator is “perfect” (we will see below that it is not), but only that it is a good minimal standard against which other generators should be judged. It is not possible to implement equations (7.1.2) and (7.1.3) directly in a a and m − 1 exceeds the maximum value high-level language, since the product of for a 32-bit integer. Assembly language implementation using a 64-bit product register is straightforward, but not portable from machine to machine. A trick [2,3] for multiplying two 32-bit integers modulo a 32-bit constant, due to Schrage without using any intermediates larger than 32 bits (including a sign bit) is therefore extremely interesting: It allows the Minimal Standard generator to be implemented in essentially any programming language on essentially any machine. approximate factorization of m , Schrage’s algorithm is based on an + 7.1.4 ) m = aq ( r, i.e., q =[ m/a ] ,r = m mod a r is small, specifically r

303 7.1 Uniform Deviates 279 #define IA 16807 #define IM 2147483647 #define AM (1.0/IM) #define IQ 127773 #define IR 2836 #define MASK 123459876 float ran0(long *idum) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v “Minimal” random number generator of Park and Miller. Returns a uniform random deviate between 0.0 and 1.0. Set or reset idum MASK ) to any integer value (except the unlikely value to initialize the sequence; idum must not be altered between calls for successive deviates in a sequence. { long k; float ans; MASK allows use of zero and other XORing with *idum ^= MASK; . idum simple bit patterns for k=(*idum)/IQ; *idum=IA*(*idum-k*IQ)-IR*k; Compute idum=(IA*idum) % IM without over- flows by Schrage’s method. if (*idum < 0) *idum += IM; ans=AM*(*idum); idum to a floating result. Convert *idum ^= MASK; Unmask before return. return ans; } 31 9 2 . A peculiarity of generators of − 2 ≈ 2 . 1 × 10 ran0 is The period of the form (7.1.2) is that the value 0 must never be allowed as the initial seed — it perpetuates itself — and it never occurs for any nonzero initial seed. Experience has shown that users always manage to call random number generators with the seed . That is why ran0 performs its exclusive-or with an arbitrary constant both idum=0 on entry and exit. If you are the first user in history to be proof against human error, you can remove the two lines with the operation. ∧ Park and Miller discuss two other multipliers a that can be used with the same 31 ) and q = 3399 r a − 1 . These are a = 48271 (with = 69621 = 44488 and =2 m (with = 30845 and r = 23902 ). These can be substituted in the routine ran0 q if desired; they may be slightly superior to Lewis et al. ’s longer-tested values. No values other than these should be used. The routine ran0 is a Minimal Standard, satisfactory for the majority of applications, but we do not recommend it as the final word on random number generators. Our reason is precisely the simplicity of the Minimal Standard. It is not hard to think of situations where successive random numbers might be used in a way that accidentally conflicts with the generation algorithm. For example, 4 out of a modulus of . 6 × 10 since successive numbers differ by a multiple of only 1 9 g of machine- 2 × 10 more than , very small random numbers will tend to be followed by smaller isit website − 6 6 , for example, there will be a value < 10 10 than average values. One time in ica). always be followed by a value less than returned (as there should be), but this will 0 . 0168 . One can easily think of applications involving rare events where this about property would lead to wrong results. ran0 . For example, There are other, more subtle, serial correlations present in are binned into a two-dimensional plane for ) = i ,I I ( if successive points i i +1 2 test when N is greater than a χ 2 1 ,...,N , then the resulting distribution fails the , 7 × 10 few , much less than the period m − 2 . Since low-order serial correlations have historically been such a bugaboo, and since there is a very simple way to remove

304 280 Random Numbers Chapter 7. them, we think that it is prudent to do so. , uses the Minimal Standard for its random value, The following routine, ran1 but it shuffles the output to remove low-order serial correlations. A random deviate th call, but rather , is output not on the j j th value in the sequence, I derived from the j on average. The shuffling algorithm is due to Bays +32 on a randomized later call, j [4] and Durham as described in Knuth , and is illustrated in Figure 7.1.1. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin #define IA 16807 #define IM 2147483647 #define AM (1.0/IM) #define IQ 127773 #define IR 2836 #define NTAB 32 #define NDIV (1+(IM-1)/NTAB) #define EPS 1.2e-7 #define RNMX (1.0-EPS) float ran1(long *idum) “Minimal” random number generator of Park and Miller with Bays-Durham shuffle and added safeguards. Returns a uniform random deviate between 0.0 and 1.0 (exclusive of the endpoint values). Call with idum a negative integer to initialize; thereafter, do not alter idum between successive deviates in a sequence. RNMX should approximate the largest floating value that is less than 1. { int j; long k; static long iy=0; static long iv[NTAB]; float temp; Initialize. if (*idum <= 0 || !iy) { Be sure to prevent if (-(*idum) < 1) *idum=1; =0 . idum else *idum = -(*idum); for (j=NTAB+7;j>=0;j--) { Load the shuffle table (after 8 warm-ups). k=(*idum)/IQ; *idum=IA*(*idum-k*IQ)-IR*k; if (*idum < 0) *idum += IM; if (j < NTAB) iv[j] = *idum; } iy=iv[0]; } Start here when not initializing. k=(*idum)/IQ; *idum=IA*(*idum-k*IQ)-IR*k; Compute idum=(IA*idum) % IM without over- flows by Schrage’s method. if (*idum < 0) *idum += IM; j=iy/NDIV; Will be in the range 0..NTAB-1 . Output previously stored value and refill the iy=iv[j]; shuffle table. iv[j] = *idum; if ((temp=AM*iy) > RNMX) return RNMX; Because users don’t expect endpoint values. g of machine- else return temp; isit website } ica). The routine ran0 is known to fail. In passes those statistical tests that ran1 ran1 fails to pass, except when the fact, we do not know of any statistical test that 8 ≈ m/ 20 . number of calls starts to become on the order of the period m , say > 10 [6] For situations when even longer random sequences are needed, L’Ecuyer has given a good way of combining two different sequences with different periods so as to obtain a new sequence whose period is the least common multiple of the two periods. The basic idea is simply to add the two sequences, modulo the modulus of

305 7.1 Uniform Deviates 281 iy 1 Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer iv 0 OUTPUT RAN 3 2 iv 31 fl ing procedure used in ran1 to break up sequential correlations in the Minimal Figure 7.1.1. Shuf Standard generator. Circled numbers indicate the sequence of events: On each call, the random number iy iv . That element becomes the output random in is used to choose a random element in the array iy iv is re fi lled from the Minimal Standard routine. number, and also is the next . Its spot in of them (call it m ). A trick to avoid an intermediate value that over either ows the fl integer wordsize is to subtract rather than add, and then add back the constant − 1 m ≤ , so as to wrap around into the desired interval 0 ,...,m − 1 . if the result is 0 Notice that it is not necessary that this wrapped subtraction be able to reach 0 ,...,m − 1 from all values value of the fi rst sequence. Consider the absurd every extreme case where the value subtracted was only between 1 and 10: The resulting fi rst sequence by itself. As a sequence would still be no less random than the practical matter it is only necessary that the second sequence have a range covering substantially fi rst. L ’ Ecuyer recommends the use of the two all of the range of the r , = 12211 ) and = 53668 q = 2147483563 (with a , = 40014 m generators 1 1 1 1 m = 3791 = 2147483399 (with a ). Both moduli = 40692 , q r = 52774 , 2 2 2 2 31 and 1=2 . The periods m × − 81031 × 3 × 7 × 631 are slightly less than 2 1 g of machine- m share only the factor 2, so the period of × × − 1=2 × 19 1789 31 × 1019 isit website 2 18 . For present computers, period exhaustion the combined generator is 2 . 3 × 10 ≈ ica). is a practical impossibility. Combining the two generators breaks up serial correlations to a considerable extent. We nevertheless recommend the additional shuf fl e that is implemented in the following routine, ran2 . We think that, within the limits of its fl oating-point precision, ran2 provides perfect random numbers; a practical de fi nition of “ perfect ” is that we will pay 1000 to the fi rst reader who convinces us otherwise (by fi nding a $ statistical test that ran2 fails in a nontrivial way, excluding the ordinary limitations of a machine ’ s fl oating-point representation).

306 282 Random Numbers Chapter 7. #define IM1 2147483563 #define IM2 2147483399 #define AM (1.0/IM1) #define IMM1 (IM1-1) #define IA1 40014 #define IA2 40692 #define IQ1 53668 #define IQ2 52774 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v #define IR1 12211 #define IR2 3791 #define NTAB 32 #define NDIV (1+IMM1/NTAB) #define EPS 1.2e-7 #define RNMX (1.0-EPS) float ran2(long *idum) 18 × Long period ( > 2 10 ) random number generator of L’Ecuyer with Bays-Durham shuffle and added safeguards. Returns a uniform random deviate between 0.0 and 1.0 (exclusive of the endpoint values). Call with idum a negative integer to initialize; thereafter, do not alter idum between successive deviates in a sequence. RNMX should approximate the largest floating value that is less than 1. { int j; long k; static long idum2=123456789; static long iy=0; static long iv[NTAB]; float temp; Initialize. if (*idum <= 0) { . if (-(*idum) < 1) *idum=1; Be sure to prevent idum =0 else *idum = -(*idum); idum2=(*idum); Load the shuffle table (after 8 warm-ups). for (j=NTAB+7;j>=0;j--) { k=(*idum)/IQ1; *idum=IA1*(*idum-k*IQ1)-k*IR1; if (*idum < 0) *idum += IM1; if (j < NTAB) iv[j] = *idum; } iy=iv[0]; } Start here when not initializing. k=(*idum)/IQ1; *idum=IA1*(*idum-k*IQ1)-k*IR1; Compute idum=(IA1*idum) % IM1 without overflows by Schrage’s method. if (*idum < 0) *idum += IM1; k=idum2/IQ2; likewise. idum2=IA2*(idum2-k*IQ2)-k*IR2; Compute idum2=(IA2*idum) % IM2 if (idum2 < 0) idum2 += IM2; j=iy/NDIV; Will be in the range 0..NTAB-1 . are idum iy=iv[j]-idum2; Here idum is shuffled, idum2 and combined to generate output. iv[j] = *idum; if (iy < 1) iy += IMM1; g of machine- if ((temp=AM*iy) > RNMX) return RNMX; Because users don’t expect endpoint values. isit website else return temp; ica). } [6] ’ Ecuyer lists additional short generators that can be combined into longer L ones, including generators that can be implemented in 16-bit integer arithmetic. [4] ’ Finally, we give you Knuth s suggestion for a portable routine, which we have translated to the present conventions as ran3 . This is not based on the linear [5] ). One congruential method at all, but rather on a subtractive method (see also might hope that its weaknesses, if any, are therefore of a highly different character

307 7.1 Uniform Deviates 283 from the weaknesses, if any, of ran1 above. If you ever suspect trouble with one routine, it is a good idea to try the other in the same application. ran3 has one nice feature: if your machine is poor on integer arithmetic (i.e., is limited to 16-bit mbig mseed and , integers), you can declare mj ne mk , and ma[] as float ,de fi as 4000000 and 1618033, respectively, and the routine will be rendered entirely fl oating-point. Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) . #include Change to math.h in K&R C #define MBIG 1000000000 #define MSEED 161803398 #define MZ 0 #define FAC (1.0/MBIG) According to Knuth, any large , and any smaller (but still large) MSEED can be substituted MBIG for the above values. float ran3(long *idum) 0 . Returns a uniform random deviate between 0 and 1 . 0 .Set idum to any negative value to initialize or reinitialize the sequence. { static int inext,inextp; ) is special and static long ma[56]; The value 56 (range ma[1..55] should not be modified; see Knuth. static int iff=0; long mj,mk; int i,ii,k; Initialization. if (*idum < 0 || iff == 0) { iff=1; Initialize ma[55] using the seed idum and the mj=labs(MSEED-labs(*idum)); large number . mj %= MBIG; MSEED ma[55]=mj; mk=1; for (i=1;i<=54;i++) { Now initialize the rest of the table, ii=(21*i) % 55; in a slightly random order, ma[ii]=mk; with numbers that are not especially random. mk=mj-mk; if (mk < MZ) mk += MBIG; mj=ma[ii]; } for (k=1;k<=4;k++) We randomize them by “warming u pthe gener- ator.” for (i=1;i<=55;i++) { ma[i] -= ma[1+(i+30) % 55]; if (ma[i] < MZ) ma[i] += MBIG; } inext=0; Prepare indices for our first generated number. inextp=31; The constant 31 is special; see Knuth. *idum=1; } Here is where we start, except on initialization. if (++inext == 56) inext=1; , wrapping around Increment inext and inextp g of machine- 56 to 1. if (++inextp == 56) inextp=1; isit website mj=ma[inext]-ma[inextp]; Generate a new random number subtractively. ica). if (mj < MZ) mj += MBIG; Be sure that it is in range. ma[inext]=mj; Store it, return mj*FAC; and output the derived uniform deviate. } Quick and Dirty Generators One sometimes would like a “ quick and dirty ” generator to embed in a program, perhaps taking only one or two lines of code, just to somewhat randomize things. One might wish to

308 284 Chapter 7. Random Numbers process data from an experiment not always in exactly the same order, for example, so that rst output is more “ typical ” than might otherwise be the case. fi the good m ” choices for For this kind of application, all we really need is a list of , a , and “ 4 6 ’ t need a period longer than 10 c in equation (7.1.1). If we don 10 to , say, we can keep the ows that would otherwise mandate the value of ( m − 1) a + c small enough to avoid over fl s method (above). We can thus easily embed in our programs extra complexity of Schrage ’ Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer unsigned long jran,ia,ic,im; float ran; ... jran=(jran*ia+ic) % im; ran=(float) jran / (float) im; whenever we want a quick and dirty uniform deviate, or jran=(jran*ia+ic) % im; j=jlo+((jhi-jlo+1)*jran)/im; and jhi whenever we want an integer between jran was once jlo , inclusive. (In both cases im-1 .) initialized to any seed value between 0 and im is small, the Be sure to remember, however, that when th root of it, which is the k number of planes in -space, is even smaller! So a quick and dirty generator should never k k k> 1 . be used to select points in -space with “ With these caveats, some ” choices for the constants are given in the accompanying good table. These constants (i) give a period of maximal length im , and, more important, (ii) pass is a prime, close to for dimensions 2, 3, 4, 5, and 6. The increment ” Knuth ’ s “ spectral test ic √ 1 1 the value ( 3) ; actually almost any value of ic that is relatively prime to im will do − im 6 2 [4] “ lore just as well, but there is some favoring this choice (see ” , p. 84). An Even Quicker Generator In C , if you multiply two unsigned long int integers on a machine with a 32-bit long integer representation, the value returned is the low-order 32 bits of the true 64-bit product. If 32 we now choose m =2 ” , the “ mod in equation (7.1.1) is free, and we have simply ) c 7.1.6 ( + = aI I j j +1 Knuth suggests a = 1664525 as a suitable multiplier for this value of m . H.W. Lewis , which is a prime close = 1013904223 has conducted extensive tests of this value of a with c √ 2) 5 − m . The resulting in-line generator (we will call it ranqd1 ) is simply ( to unsigned long idum; ... idum = 1664525L*idum + 1013904223L; g of machine- isit website ica). This is about as good as any 32-bit linear congruential generator, entirely adequate for many very fast. uses. And, with only a single multiply and add, it is To check whether your machine has the desired integer properties, see if you can generate the following sequence of 32-bit values (given here in hex): 00000000, 3C6EF35F, 47502932, D1CCF6E9, AAF95334, 6252E503, 9F2EC686, 57FE6C2D, A3D95FA8, 81FD- BEE7, 94F0AF1A, CBF633B1. If you need fl oating-point values instead of 32-bit integers, and want to avoid a divide by 32 fl oating-point 2 , a dirty trick is to mask in an exponent that makes the value lie between 1 and 2, then subtract 1.0. The resulting in-line generator (call it ranqd2 ) will look something like

309 7.1 Uniform Deviates 285 Constants for Quick and Dirty Random Number Generators fl ow at fl over over im ia ic ow at im ia ic 86436 1093 18257 6075 106 1283 20 121500 1021 2 25673 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 259200 421 54773 211 7875 1663 27 21 2 2 24749 1663 421 7875 117128 1277 22 25673 2 121500 2041 741 312500 66037 6075 1366 1283 28 2 6655 1399 936 2531 430 11979 145800 3661 30809 23 2 36979 175000 2661 233280 1861 49297 967 3041 14406 51749 244944 1597 419 6173 29282 29 2 53125 171 11213 24 2 29573 139968 3877 45289 214326 3613 2731 12960 1741 714025 1366 150889 14000 1541 2957 30 2 4621 21870 1291 625 31104 6571 28411 134456 8121 139968 205 29573 259200 7141 54773 25 31 2 2 6173 29282 1255 49297 233280 9301 81000 421 17117 714025 4096 150889 32 2 134456 281 28411 26 2 unsigned long idum,itemp; float rand; #ifdef vax static unsigned long jflone = 0x00004080; static unsigned long jflmsk = 0xffff007f; #else static unsigned long jflone = 0x3f800000; static unsigned long jflmsk = 0x007fffff; #endif ... idum = 1664525L*idum + 1013904223L; itemp = jflone | (jflmsk & idum); rand = (*(float *)&itemp)-1.0; The hex constants 3F800000 and 007FFFFF are the appropriate ones for computers using oating-point numbers (e.g., IBM PCs and most UNIX the IEEE representation for 32-bit fl g of machine- isit website workstations). For DEC VAXes, the correct hex constants are, respectively, 00004080 and ica). FFFF007F. Notice that the IEEE mask results in the fl oating-point number being constructed out of the 23 low-order bits of the integer, which is not ideal. (Your authors have tried very hard to make almost all of the material in this book machine and compiler independent — indeed, even programming language independent. This subsection is a rare aberration. Forgive us. Once in a great while the temptation to be really dirty is just irresistible.) Relative Timings and Recommendations Timings are inevitably machine dependent. Nevertheless the following table

310 286 Random Numbers Chapter 7. is indicative of the relative timings, for typical machines, of the various uniform generators discussed in this section, plus ran4 from § 7.5. Smaller values in the table ranqd2 “ ranqd1 quick indicate faster generators. The generators refer to the and and dirty ” generators immediately above. Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Relative Execution Time Generator ran0 ≡ 1 . 0 ran1 ≈ 1 . 3 ran2 2 . 0 ≈ ran3 ≈ 0 . 6 ranqd1 ≈ 0 . 10 ranqd2 0 . 25 ≈ ran4 ≈ 4 . 0 On balance, we recommend ran1 for general use. It is portable, based on e, and has no s Minimal Standard generator with an additional shuf fl Park and Miller ’ known (to us) aws other than period exhaustion. fl If you are generating more than 100,000,000 random numbers in a single ran1 ’ s period), we recommend the use calculation (that is, more than about 5% of ran2 , with its much longer period. of ’ seems to be the timing winner among portable s subtractive routine ran3 Knuth routines. Unfortunately the subtractive method is not so well studied, and not a standard. We like to keep ran3 in reserve for a “ second opinion, ” substituting it when we suspect another generator of introducing unwanted correlations into a calculation. The routine ran4 generates extremely good random deviates, and has some other nice properties, but it is slow. See § 7.5 for discussion. ranqd1 are very fast, Finally, the quick and dirty in-line generators ranqd2 and but they are somewhat machine dependent, and at best only as good as a 32-bit linear congruential generator ever is — in our view not good enough in many situations. We would use these only in very special cases, where speed is critical. CITED REFERENCES AND FURTHER READING: Park, S.K., and Miller, K.W. 1988, Communications of the ACM , vol. 31, pp. 1192–1201. [1] g of machine- isit website Schrage, L. 1979, ACM Transactions on Mathematical Software , vol. 5, pp. 132–138. [2] ica). Bratley, P., Fox, B.L., and Schrage, E.L. 1983, A Guide to Simulation (New York: Springer- Verlag). [3] Knuth, D.E. 1981, Seminumerical Algorithms The Art of Computer Programming , 2nd ed., vol. 2 of §§ 3.2–3.3. [4] (Reading, MA: Addison-Wesley), Kahaner, D., Moler, C., and Nash, S. 1989, Numerical Methods and Software (Englewood Cliffs, NJ: Prentice Hall), Chapter 10. [5] L’Ecuyer, P. 1988, Communications of the ACM , vol. 31, pp. 742–774. [6] Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical Computations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 10.

311 7.2 Transformation Method: Exponential and Normal Deviates 287 7.2 Transformation Method: Exponential and Normal Deviates In the previous section, we learned how to generate random deviates with a uniform probability distribution, so that the probability of generating a number Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) and x + dx , denoted p ( x ) dx , is given by between x { 1 float expdev(long *idum) g of machine- isit website Returns an exponentially distributed, positive, random deviate of unit mean, using ran1(idum) as the source of uniform deviates. ica). { float ran1(long *idum); float dum; do dum=ran1(idum); while (dum == 0.0); return -log(dum); }

312 288 Random Numbers Chapter 7. 1 uniform y ⌠ F ( y ) = p ( y ) dy deviate in ⌡ 0 x readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ( y ) p 0 y transformed deviate out Figure 7.2.1. Transformation method for generating a random deviate y from a known probability p ( y ) . The indefinite integral of p distribution y ) must be known and invertible. A uniform deviate x is ( chosen between 0 and 1 . Its corresponding y on the definite-integral curve is the desired deviate. Let’s see what is involved in using the above transformation method to generate some arbitrary desired distribution of y p ( y )= f ( y ) for some positive ’s, say one with function f whose integral is 1. (See Figure 7.2.1.) According to (7.2.4), we need to solve the differential equation dx = ( y )( 7.2.6 ) f dy But the solution of this is just x = F ( y ) , where F ( y ) is the indefinite integral of f ) . The desired transformation which takes a uniform deviate into one distributed y ( f ( y ) is therefore as − 1 x )= F y ( ( x )( 7.2.7 ) 1 − is the inverse function to F . Whether (7.2.7) is feasible to implement F where depends on whether the inverse function of the integral of f(y) is itself feasible to compute, either analytically or numerically. Sometimes it is, and sometimes it isn’t. Incidentally, (7.2.7) has an immediate geometric interpretation: Since F ( ) is y the area under the probability curve to the left of y , (7.2.7) is just the prescription: choose a uniform random x , then find the value y that has that fraction x of probability area to its left, and return the value y . Normal (Gaussian) Deviates g of machine- isit website x Transformation methods generalize to more than one dimension. If ,x , 2 1 ica). are random deviates with a probability distribution p ( x joint ... ) ,... ,x 2 1 dx ... , and if y dx ,... are each functions of all the x ’s (same number of ,y 1 1 2 2 ’s as ’s), then the joint probability distribution of the y ’s is y x ∣ ∣ ∣ ∣ ) ,... ,x ( x ∂ 1 2 ∣ ∣ dy ,x 7.2.8 ,... ) ,y ,... ) dy ( dy ... ... = p ( ) x y p ( dy 1 2 2 2 2 1 1 1 ∣ ∣ ,... ) ∂ y ,y ( 2 1 where | ∂ () /∂ () | is the Jacobian determinant of the x ’s with respect to the y ’s (or reciprocal of the Jacobian determinant of the y ’s with respect to the x ’s).

313 7.2 Transformation Method: Exponential and Normal Deviates 289 method for Box-Muller An important example of the use of (7.2.8) is the generating random deviates with a normal (Gaussian) distribution, 2 1 2 / y − √ 7.2.9 ( dy ) e = dy ) y ( p π 2 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Consider the transformation between two uniform deviates on (0,1), x , and ,x 2 1 two quantities y ,y , 2 1 √ = 2ln x cos 2 πx − y 1 2 1 ) ( 7.2.10 √ 2ln = sin 2 πx y − x 1 2 2 Equivalently we can write ] [ 1 2 2 ( y =exp ) + y − x 1 2 1 2 ) 7.2.11 ( y 1 2 arctan = x 2 π y 2 1 Now the Jacobian determinant can readily be calculated (try it!): ∣ ∣ [ ][ ] ∂x ∂x 1 1 ∣ ∣ 2 2 1 ( x ∂ 1 ) ,x ∣ ∣ 1 2 ∂y ∂y 2 / − y − / 2 y 2 1 2 1 √ √ = − e e = ) 7.2.12 ( ∣ ∣ ∂x ∂x 2 2 ∣ ∣ ,y y ( ) ∂ 2 2 π π 1 2 ∂y ∂y 2 1 y alone and a function of alone, we see y Since this is the product of a function of 1 2 y is independently distributed according to the normal distribution (7.2.9). that each One further trick is useful in applying (7.2.10). Suppose that, instead of picking v as the and x and in the unit square, we instead pick v x uniform deviates 1 2 1 2 ordinate and abscissa of a random point inside the unit circle around the origin. Then 2 2 2 v ≡ + v , is a uniform deviate, which can be used for x R the sum of their squares, 1 2 1 while the angle that ( v ,v axis can serve as the random v ) defines with respect to the 2 1 1 . What’s the advantage? It’s that the cosine and sine in (7.2.10) can now angle 2 πx 2 √ √ 2 2 v be written as R and v / / , obviating the trigonometric function calls! R 2 1 We thus have #include float gasdev(long *idum) Returns a normally distributed deviate with zero mean and unit variance, using ran1(idum) g of machine- as the source of uniform deviates. isit website { ica). float ran1(long *idum); static int iset=0; static float gset; float fac,rsq,v1,v2; if (*idum < 0) iset=0; Reinitialize. if (iset == 0) { We don’t have an extra deviate handy, so do { v1=2.0*ran1(idum)-1.0; pick two uniform numbers in the square ex- tending from -1 to +1 in each direction, v2=2.0*ran1(idum)-1.0; rsq=v1*v1+v2*v2; see if they are in the unit circle,

314 290 Chapter 7. Random Numbers and if they are not, try again. } while (rsq >= 1.0 || rsq == 0.0); fac=sqrt(-2.0*log(rsq)/rsq); Now make the Box-Muller transformation to get two normal deviates. Return one and save the other for next time. gset=v1*fac; iset=1; Set flag. return v2*fac; We have an extra deviate handy, } else { Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin so unset the flag, iset=0; return gset; and return it. } } [2] [1] See Devroye and Bratley for many additional algorithms. CITED REFERENCES AND FURTHER READING: Non-Uniform Random Variate Generation Devroye, L. 1986, (New York: Springer-Verlag), 9.1. § [1] (New York: Springer- Bratley, P., Fox, B.L., and Schrage, E.L. 1983, A Guide to Simulation Verlag). [2] Seminumerical Algorithms , 2nd ed., vol. 2 of The Art of Computer Programming Knuth, D.E. 1981, (Reading, MA: Addison-Wesley), pp. 116ff. 7.3 Rejection Method: Gamma, Poisson, Binomial Deviates rejection method is a powerful, general technique for generating random The deviates whose distribution function p x ) dx (probability of a value occurring between ( ) is known and computable. The rejection method does x not x and require + dx ( x ) ] be readily p that the cumulative distribution function [indefinite integral of computable, much less the inverse of that function — which was required for the transformation method in the previous section. The rejection method is based on a simple geometrical argument: Draw a graph of the probability distribution p ( x ) that you wish to generate, so corresponds to the desired probability that the area under the curve in any range of x of generating an x in that range. If we had some way of choosing a random point in x two dimensions , with uniform probability in the area under your curve, then the value of that random point would have the desired distribution. g of machine- x Now, on the same graph, draw any other curve f ( which has finite (not ) isit website infinite) area and lies everywhere above your original probability distribution. (This ica). is always possible, because your original curve encloses only unit area, by definition f ( x ) the comparison function . Imagine now of probability.) We will call this that you have some way of choosing a random point in two dimensions that is uniform in the area under the comparison function. Whenever that point lies outside the area under the original probability distribution, we will reject it and choose another random point. Whenever it lies inside the area under the original probability distribution, we will accept it. It should be obvious that the accepted points are uniform in the accepted area, so that their x values have the desired distribution. It

315 7.3 Rejection Method: Gamma, Poisson, Binomial Deviates 291 A first random deviate in x ⌠ dx ) x ( f ⌡ 0 http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v x reject 0 x ( f ) 0 ) x f ( x accept 0 second random deviate in ( x ) p 0 0 x 0 Figure 7.3.1. Rejection method for generating a random deviate x from a known probability distribution p ( ) that is everywhere less than some other function f ( x ) . The transformation method is fi rst used to x x of the distribution f (compare Figure 7.2.1). A second uniform deviate is generate a random deviate used to decide whether to accept or reject that x . If it is rejected, a new deviate of f is found; and so on. The ratio of accepted to rejected points is the ratio of the area under p to the area between p and f . should also be obvious that the fraction of points rejected just depends on the ratio of the area of the comparison function to the area of the probability distribution function, not on the details of shape of either function. For example, a comparison function whose area is less than 2 will reject fewer than half the points, even if it approximates the probability function very badly at some values of x , e.g., remains fi nite in some region where p ( x ) is zero. It remains only to suggest how to choose a uniform random point in two x ) . A variant of the transformation dimensions under the comparison function f ( method ( 7.2) does nicely: Be sure to have chosen a comparison function whose § inde fi nite integral is known analytically, and is also analytically invertible to give x “ area under the comparison function to the left of x . ” Now pick a as a function of A A , where ) is the total area under f ( x uniform deviate between 0 and , and use it to get a corresponding x . Then pick a uniform deviate between 0 and f ( x ) as the y value for the two-dimensional point. You should be able to convince yourself that the f point ( x,y ) is uniformly distributed in the area under the comparison function ( x ) . An equivalent procedure is to pick the second uniform deviate between zero and one, and accept or reject according to whether it is respectively less than or x . greater than the ratio p ( ) ) /f ( x p ( x ) requires that one So, to summarize, the rejection method for some given x fi nd, once and for all, some reasonably good comparison function f ( ) . Thereafter, each deviate generated requires two uniform random deviates, one evaluation of f (to get the coordinate y ), and one evaluation of p (to decide whether to accept or reject g of machine- ). Figure 7.3.1 illustrates the procedure. Then, of course, this procedure the point x,y isit website ica). must be repeated, on the average, times before the fi nal deviate is obtained. A Gamma Distribution The gamma distribution of integer order a> 0 is the waiting time to the a th event in a Poisson random process of unit mean. For example, when a =1 , it is just the exponential distribution of § 7.2, the waiting time to the fi rst event.

316 292 Chapter 7. Random Numbers dx ) of occurring with a value between x ( A gamma deviate has probability p a + , where dx x and x − x a − 1 e x ) dxx> 0( 7.3.1 p = ) x ( dx a ) a Γ( Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. a To generate deviates of (7.3.1) for small values of a , it is best to add up exponentially distributed waiting times, i.e., logarithms of uniform deviates. Since the sum of logarithms is the logarithm of the product, one really has only to generate uniform deviates, then take the log. the product of a For larger values of a , the distribution (7.3.1) has a typically “ bell-shaped ” √ x = and a half-width of about form, with a peak at a a . We will be interested in several probability distributions with this same qual- itative form. A useful comparison function in such cases is derived from the Lorentzian distribution ) ( 1 1 ) 7.3.2 ( dy p ) dy = ( y 2 y 1+ π whose inverse inde fi nite integral is just the tangent function. It follows that the x -coordinate of an area-uniform random point under the comparison function c 0 ) 7.3.3 ( x )= f ( 2 2 1+( ) − /a x x 0 0 a for any constants ,c and x , can be generated by the prescription , 0 0 0 x = a ) πU )+ x ( 7.3.4 tan( 0 0 where U is a uniform deviate between 0 and 1. Thus, for some speci fi c “ bell-shaped ” a nd constants fi p ( x ) probability distribution, we need only , with the product ,x ,c 0 0 0 (which determines the area) as small as possible, such that (7.3.3) is everywhere c a 0 0 ( greater than p . x ) Ahrens has done this for the gamma distribution, yielding the following [1] ): algorithm (as described in Knuth #include float gamdev(int ia, long *idum) Returns a deviate distributed as a gamma distribution of integer order ia , i.e., a waiting time to the as the source of ran1(idum) ia th event in a Poisson process of unit mean, using g of machine- uniform deviates. isit website { ica). float ran1(long *idum); void nrerror(char error_text[]); int j; float am,e,s,v1,v2,x,y; if (ia < 1) nrerror("Error in routine gamdev"); if (ia < 6) { Use direct method, adding waiting times. x=1.0; for (j=1;j<=ia;j++) x *= ran1(idum); x = -log(x); } else { Use rejection method.

317 7.3 Rejection Method: Gamma, Poisson, Binomial Deviates 293 do { do { do { These four lines generate the tan- gent of a random angle, i.e., they v1=ran1(idum); are equivalent to v2=2.0*ran1(idum)-1.0; } while (v1*v1+v2*v2 > 1.0); y = tan( π * ran1(idum)) . y=v2/v1; am=ia-1; readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. s=sqrt(2.0*am+1.0); x=s*y+am; x : We decide whether to reject Reject in region of zero probability. } while (x <= 0.0); Ratio of prob. fn. to comparison fn. e=(1.0+y*y)*exp(am*log(x/am)-s*y); } while (ran1(idum) > e); Reject on basis of a second uniform } deviate. return x; } Poisson Deviates The Poisson distribution is conceptually related to the gamma distribution. It gives the probability of a certain integer number m of unit rate Poisson random events occurring in a given interval of time x , while the gamma distribution was the m takes probability of waiting time between x and x + dx to the m th event. Note that on only integer values 0 , so that the Poisson distribution, viewed as a continuous ≥ is an integer m , is zero everywhere except where ( m ) dm p distribution function x 0 ≥ nite, such that the integrated probability over a region . At such places, it is in fi fi nite number. The total probability at an integer j is containing the integer is some ∫ +  j j − x e x 7.3.5 ) ( )= j ( Prob ( m p dm = ) x j ! −  j fi At rst sight this might seem an unlikely candidate distribution for the rejection method, since no continuous comparison function can be larger than the in fi nitely fi nitely narrow, Dirac delta functions in p tall, but in ( m ) . However, there is a trick x that we can do: Spread the fi nite area in the spike at j uniformly into the interval nes a continuous distribution between q j and j +1 . This de fi ( m ) dm given by x x [ − m ] e x ( 7.3.6 ) dm = dm ) ( m q x m ]! [ represents the largest integer less than ] where [ m . If we now use the rejection m g of machine- method to generate a (noninteger) deviate from (7.3.6), and then take the integer isit website part of that deviate, it will be as if drawn from the desired distribution (7.3.5). (See ica). Figure 7.3.2.) This trick is general for any integer-valued probability distribution. For x large enough, the distribution (7.3.6) is qualitatively bell-shaped (albeit with a bell made out of small, square steps), and we can use the same kind of x , we can Lorentzian comparison function as was already used above. For small generate independent exponential deviates (waiting times between events); when the sum of these fi rst exceeds x , then the number of events that would have occurred in waiting time x becomes known and is one less than the number of terms in the sum. These ideas produce the following routine:

318 294 Random Numbers Chapter 7. 1 in readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) reject accept 0 12345 Rejection method as applied to an integer-valued distribution. The method is performed Figure 7.3.2. on the step function shown as a dashed line, yielding a real-valued deviate. This deviate is rounded down to the next lower integer, which is output. #include #define PI 3.141592654 float poidev(float xm, long *idum) Returns as a floating-point number an integer value that is a random deviate drawn from a ran1(idum) as a source of uniform random deviates. xm ,using Poisson distribution of mean { float gammln(float xx); float ran1(long *idum); static float sq,alxm,g,oldm=(-1.0); oldm is a flag for whether xm has changed since last call. float em,t,y; Use direct method. if (xm < 12.0) { if (xm != oldm) { oldm=xm; g=exp(-xm); If xm is new, compute the exponential. } em = -1; t=1.0; do { Instead of adding exponential deviates it is equiv- alent to multiply uniform deviates. We never ++em; g of machine- actually have to take the log, merely com- t *= ran1(idum); isit website pare to the pre-computed exponential. } while (t > g); ica). } else { Use rejection method. if (xm != oldm) { If xm has changed since the last call, then pre- compute some functions that occur below. oldm=xm; sq=sqrt(2.0*xm); alxm=log(xm); g=xm*alxm-gammln(xm+1.0); The function gammln is the natural log of the gamma function, as given in § 6 . 1 . } do { do { y is a deviate from a Lorentzian comparison func- tion. y=tan(PI*ran1(idum));

319 7.3 Rejection Method: Gamma, Poisson, Binomial Deviates 295 , shifted and scaled. em=sq*y+xm; em y is } while (em < 0.0); Reject if in regime of zero probability. em=floor(em); The trick for integer-valued distributions. t=0.9*(1.0+y*y)*exp(em*alxm-gammln(em+1.0)-g); The ratio of the desired distribution to the comparison function; we accept or reject by comparing it to another uniform deviate. The factor 0.9 is chosen so that t never exceeds 1. } while (ran1(idum) > t); http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v } return em; } Binomial Deviates If an event occurs with probability q , and we make n trials, then the number of times m that it occurs has the binomial distribution, ) ( ∫ +  j n j − n j q dm = 7.3.7 ( p ( m ) (1 − q ) ) n,q j j −  taking on possible values m The binomial distribution is integer valued, with , so is correspondingly a from 0 to two parameters, n and q . It depends on n bit harder to implement than our previous examples. Nevertheless, the techniques already illustrated are suf fi ciently powerful to do the job: #include #define PI 3.141592654 float bnldev(float pp, int n, long *idum) Returns as a floating-point number an integer value that is a random deviate drawn from a binomial distribution of n trials each of probability pp ,using ran1(idum) as a source of uniform random deviates. { float gammln(float xx); float ran1(long *idum); int j; static int nold=(-1); float am,em,g,angle,p,bnl,sq,t,y; static float pold=(-1.0),pc,plog,pclog,en,oldg; p=(pp <= 0.5 ? pp : 1.0-pp); 1-pp The binomial distribution is invariant under changing pp to , if we also change the minus itself; we’ll remember to do this below. answer to n This is the mean of the deviate to be produced. am=n*p; g of machine- if (n < 25) { Use the direct method while n is not too large. isit website This can require up to 25 calls to ran1 . bnl=0.0; ica). for (j=1;j<=n;j++) if (ran1(idum) < p) ++bnl; } else if (am < 1.0) { If fewer than one event is expected out of 25 or more trials, then the distribution is quite g=exp(-am); t=1.0; accurately Poisson. Use direct Poisson method. for (j=0;j<=n;j++) { t *= ran1(idum); if (t < g) break; } bnl=(j <= n ? j : n); } else { Use the rejection method.

320 296 Chapter 7. Random Numbers if (n != nold) { n If has changed, then compute useful quanti- ties. en=n; oldg=gammln(en+1.0); nold=n; } if (p != pold) { If p has changed, then compute useful quanti- ties. pc=1.0-p; plog=log(p); pclog=log(pc); http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v pold=p; } sq=sqrt(2.0*am*pc); The following code should by now seem familiar: rejection method with a Lorentzian compar- do { ison function. do { angle=PI*ran1(idum); y=tan(angle); em=sq*y+am; } while (em < 0.0 || em >= (en+1.0)); Reject. em=floor(em); Trick for integer-valued distribution. t=1.2*sq*(1.0+y*y)*exp(oldg-gammln(em+1.0) -gammln(en-em+1.0)+em*plog+(en-em)*pclog); Reject. This happens about 1.5 times per devi- } while (ran1(idum) > t); bnl=em; ate, on average. } Remember to undo the symmetry transforma- if (p != pp) bnl=n-bnl; tion. return bnl; } [2] [3] and Bratley for many additional algorithms. See Devroye CITED REFERENCES AND FURTHER READING: Knuth, D.E. 1981, Seminumerical Algorithms , 2nd ed., vol. 2 of The Art of Computer Programming (Reading, MA: Addison-Wesley), pp. 120ff. [1] X.4. § Devroye, L. 1986, Non-Uniform Random Variate Generation (New York: Springer-Verlag), [2] Bratley, P., Fox, B.L., and Schrage, E.L. 1983, A Guide to Simulation (New York: Springer- Verlag). [3]. 7.4 Generation of Random Bits The language gives you useful access to some machine-level bitwise operations C g of machine- such as << (left shift). This section will show you how to put such abilities to good use. isit website The problem is how to generate single random bits, with 0 and 1 equally ica). probable. Of course you can just generate uniform random deviates between zero and one and use their high-order bit (i.e., test if they are greater than or less than 0.5). However this takes a lot of arithmetic; there are special-purpose applications, such as real-time signal processing, where you want to generate bits very much faster than that. One method for generating random bits, with two variant implementations, is based on “primitive polynomials modulo 2.” The theory of these polynomials is beyond our scope (although § 7.7 and § 20.3 will give you small tastes of it). Here,

321 7.4 Generation of Random Bits 297 suffice it to say that there are special polynomials among those whose coefficients are zero or one. An example is 18 5 0 2 1 x + x ) + 7.4.1 x + x ( + x Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) which we can abbreviate by just writing the nonzero powers of x , e.g., 0) (18 , 5 , 2 , 1 , Every primitive polynomial modulo 2 of order (=18 above) defines a recurrence n relation for obtaining a new random bit from the n preceding ones. The recurrence relation is guaranteed to produce a sequence of maximal length, i.e., cycle through n bits (except all zeros) before it repeats. Therefore one all possible sequences of n 2 can seed the sequence with any initial bit pattern (except all zeros), and get − 1 random bits before the sequence repeats. Let the bits be numbered from 1 (most recently generated) through n (generated ,a ,...,a . We want to give a formula for a new bit n steps ago), and denoted a 1 2 n is finally a we will shift all the bits by one, so that the old . After generating a a 0 0 n a lost, and the new a . We then apply the formula again, and so on. becomes 0 1 “Method I” is the easiest to implement in hardware, requiring only a single shift register n bits long and a few XOR (“exclusive or” or bit addition mod 2) gates, the operation denoted in C by “ ∧ ”. For the primitive polynomial given above, the recurrence formula is = ) ( 7.4.2 a a ∧ a ∧ ∧ a a 2 18 1 0 5 The terms that are ∧ ’d together can be thought of as “taps” on the shift register, ∧ ’d into the register’s input. More generally, there is precisely one term for each nonzero coefficient in the primitive polynomial except the constant (zero bit) term. a So the first term will always be for a primitive polynomial of degree n , while the n , depending on whether the primitive polynomial last term might or might not be a 1 1 has a term in x . While it is simple in hardware, Method I is somewhat cumbersome in C , because the individual bits must be collected by a sequence of full-word masks: int irbit1(unsigned long *iseed) (which is iseed Returns as an integer a random bit, based on the 18 low-significance bits in g of machine- modified for the next call). isit website { ica). unsigned long newbit; The accumulated XOR’s. newbit = (*iseed >> 17) & 1 Get bit 18. ^ (*iseed >> 4) & 1 XOR with bit 5. ^ (*iseed >> 1) & 1 XOR with bit 2. ^ (*iseed & 1); XOR with bit 1. *iseed=(*iseed << 1) | newbit; Leftshift the seed and put the result of the XOR’s in its bit 1. return (int) newbit; }

322 298 Chapter 7. Random Numbers 1817 54321 0 shift left (a) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 0 1817 54321 shift left (b) Figure 7.4.1. Two related methods for obtaining random bits from a shift register and a primitive polynomial modulo 2. (a) The contents of selected taps are combined by exclusive-or (addition modulo 2), and the result is shifted in from the right. This method is easiest to implement in hardware. (b) fi ed by exclusive-or with the leftmost bit, which is then shifted in from the right. Selected bits are modi This method is easiest to implement in software. “ ” is less suited to direct hardware implementation (though still Method II C fi es more than one bit among the possible), but is beautifully suited to . It modi n bits as each new bit is generated (Figure 7.4.1). It generates the maximal saved length sequence, but not in the same order as Method I. The prescription for the primitive polynomial (7.4.1) is: a = a 0 18 a a = a ∧ 5 0 5 7.4.3 ( ) a a ∧ a = 2 0 2 a = a ∧ a 1 0 1 In general there will be an exclusive-or for each nonzero term in the primitive polynomial except 0 and n . The nice feature about Method II is that all the ’ exclusive-or s can usually be done as a single full-word exclusive-or operation: Powers of 2. #define IB1 1 g of machine- #define IB2 2 isit website #define IB5 16 ica). #define IB18 131072 #define MASK (IB1+IB2+IB5) int irbit2(unsigned long *iseed) Returns as an integer a random bit, based on the 18 low-significance bits in iseed (which is modified for the next call). { if (*iseed & IB18) { Change all masked bits, shift, and put 1 into bit 1. *iseed=((*iseed ^ MASK) << 1) | IB1; return 1; } else { Shift and put 0 into bit 1.

323 7.4 Generation of Random Bits 299 *iseed <<= 1; return 0; } } Some Primitive Polynomials Modulo 2 (after Watson) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. (51, 6, 3, 1, (1, 0) 0) 3, 0) (52, (2, 1, 0) 6, 2, 1, 0) (3, 1, 0) (53, 6, 5, 4, 3, 2, 0) (54, (4, 1, 0) (55, 6, 2, 1, 0) (5, 2, 0) (56, 0) (6, 1, 0) 7, 4, 2, (57, 5, 3, 2, 0) (7, 1, 0) 2, 0) 6, 5, 1, 0) (8, 4, 3, (58, (59, 6, 5, 4, 3, 1, 0) (9, 4, 0) (10, 3, (60, 1, 0) 0) (11, 2, 0) (61, 5, 2, 1, 0) (12, 6, 4, 1, 0) (62, 6, 5, 3, 0) (13, 4, 3, 1, 0) 1, 0) (63, 3, 1, 0) 4, 3, 1, 0) (14, 5, (64, (15, 1, 0) (65, 4, 3, 1, 0) 3, 2, 0) (66, (16, 5, 3, 2, 0) 8, 6, 5, (17, 3, (67, 5, 2, 1, 0) 0) 2, 1, 0) (68, 7, 5, 1, 0) (18, 5, (19, 5, 2, 1, 0) (69, 6, 5, 2, 0) (20, 3, (70, 5, 3, 1, 0) 0) (21, 2, 0) (71, 5, 3, 1, 0) (22, 1, 0) (72, 6, 4, 3, 2, 1, 0) (23, 5, 0) (73, 4, 3, 2, 0) (24, 4, 3, 1, 0) (74, 7, 4, 3, 0) 0) 6, 3, 1, (75, (25, 3, 0) (26, 6, 5, 4, 2, 0) (76, 2, 1, 0) (27, 5, 2, 1, 0) (77, 6, 5, 2, 0) 0) (78, 7, 2, 1, 0) (28, 3, 0) (79, 4, 3, 2, 0) (29, 2, (30, 6, 4, 1, 0) 7, 5, 3, 2, 1, 0) (80, (31, 3, 0) (81, 4 0) (32, 7, (82, 8, 7, 6, 4, 1, 0) 5, 3, 2, 1, 0) (33, 6, 4, 1, 0) (83, 7, 4, 2, 0) (34, 7, 6, 5, 2, 1, 0) (84, 8, 7, 5, 3, 1, 0) 0) (35, 2, 0) (85, 8, 2, 1, (36, 6, 5, 4, 2, 1, 0) (86, 6, 5, 2, 0) (37, 5, 4, 3, 2, 1, 0) (87, 7, 5, 1, 0) (88, 3, 1, 0) (38, 6, 5, 1, 0) 8, 5, 4, (89, 6, 5, 3, 0) 0) (39, 4, (90, (40, 5, 4 3, 0) 5, 3, 2, 0) (41, 3, 0) (91, 7, 6, 5, 3, 2, 0) 4, 3, 2, 1, 0) (42, 5, 0) (92, 6, 5, 2, (43, 6, 4, 3, 0) (93, 2, 0) g of machine- isit website (44, 6, 5, 2, 0) (94, 6, 5, 1, 0) (45, 4, 3, 1, 0) (95, 6, 5, 4, 2, 1, 0) ica). (46, 8, 5, 3, 2, 1, 0) 7, 6, 4, 3, 2, 0) (96, 0) (97, (47, 5, 6, 0) (48, 7, 5, 4, 2, 1, 0) (98, 7, 4, 3, 2, 1, 0) (49, 6, 5, 4, 0) (99, 7, 5, 4, 0) (50, 4, 3, 2, 0) (100, 8, 7, 2, 0) A word of caution is: Don ’ t use sequential bits from these routines as the bits of a large, supposedly random, integer, or as the bits in the mantissa of a supposedly

324 300 Random Numbers Chapter 7. random floating-point number. They are not very random for that purpose; see [1] Knuth . Examples of acceptable uses of these random bits are: (i) multiplying a ± at a rapid “chip rate,” so as to spread its spectrum uniformly 1 signal randomly by (but recoverably) across some desired bandpass, or (ii) Monte Carlo exploration of a binary tree, where decisions as to whether to branch left or right are to be made randomly. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Now we do not want you to go through life thinking that there is something special about the primitive polynomial of degree 18 used in the above examples. 18 is small enough for you to verify our claims directly by 2 (We chose 18 because [2] numerical experiment.) The accompanying table lists one primitive polynomial for each degree up to 100. (In fact there exist many such for each degree. For example, see § 7.7 for a complete table up to degree 10.) CITED REFERENCES AND FURTHER READING: Knuth, D.E. 1981, Seminumerical Algorithms , 2nd ed., vol. 2 of The Art of Computer Programming (Reading, MA: Addison-Wesley), pp. 29ff. [1] Horowitz, P., and Hill, W. 1989, The Art of Electronics , 2nd ed. (Cambridge: Cambridge University §§ 9.32–9.37. Press), Tausworthe, R.C. 1965, Mathematics of Computation , vol. 19, pp. 201–209. , vol. 16, pp. 368–369. [2] Mathematics of Computation Watson, E.J. 1962, 7.5 Random Sequences Based on Data Encryption In Numerical Recipes’ first edition, we described how to use the Data Encryption Standard [1-3] (DES) for the generation of random numbers. Unfortunately, when implemented in software in a high-level language like C , DES is very slow, so excruciatingly slow, in fact, that our previous implementation can be viewed as more mischievous than useful. Here we give a much faster and simpler algorithm which, though it may not be secure in the cryptographic sense, generates about equally good random numbers. DES, like its progenitor cryptographic system LUCIFER, is a so-called “block product [4] cipher” . It acts on 64 bits of input by iteratively applying (16 times, in fact) a kind of highly nonlinear bit-mixing function. Figure 7.5.1 shows the flow of information in DES during , which takes 32-bits into 32-bits, is called the “cipher function.” this mixing. The function g [4] Meyer and Matyas discuss the importance of the cipher function being nonlinear, as well as other design criteria. DES constructs its cipher function g from an intricate set of bit permutations and table g of machine- lookups acting on short sequences of consecutive bits. Apparently, this function was chosen isit website to be particularly strong cryptographically (or conceivably as some critics contend, to have ica). an exquisitely subtle cryptographic flaw!). For our purposes, a different function that can g be rapidly computed in a high-level computer language is preferable. Such a function may weaken the algorithm cryptographically. Our purposes are not, however, cryptographic: We want to find the fastest g , and smallest number of iterations of the mixing procedure in Figure 7.5.1, such that our output random sequence passes the standard tests that are customarily applied to random number generators. The resulting algorithm will not be DES, but rather a kind of “pseudo-DES,” better suited to the purpose at hand. Following the criterion, mentioned above, that g should be nonlinear, we must give the integer multiply operation a prominent place in g . Because 64-bit registers are not generally accessible in high-level languages, we must confine ourselves to multiplying 16-bit operands

325 7.5 Random Sequences Based on Data Encryption 301 left 32-bit word right 32-bit word g Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. XOR 32-bit left 32-bit word right 32-bit word g XOR 32-bit right 32-bit word left 32-bit word Figure 7.5.1. The Data Encryption Standard (DES) iterates a nonlinear function g on two 32-bit words, [4] in the manner shown here (after Meyer and Matyas ). into a 32-bit result. So, the general idea of g , almost forced, is to calculate the three distinct 32-bit products of the high and low 16-bit input half-words, and then to combine fi xed constants, by fast operations (e.g., add or exclusive-or) these, and perhaps additional into a single 32-bit result. There are only a limited number of ways of effecting this general scheme, allowing systematic exploration of the alternatives. Experimentation, and tests of the randomness of the output, lead to the sequence of operations shown in Figure 7.5.2. The few new elements C in the fi gure need explanation: The values fi and C xed constants, chosen randomly are 1 2 with the constraint that they have exactly 16 1-bits and 16 0-bits; combining these constants via exclusive-or ensures that the overall g has no bias towards 0 or 1 bits. The “ reverse half-words ” operation in Figure 7.5.2 turns out to be essential; otherwise, the very lowest and very highest bits are not properly mixed by the three multiplications. ” to do the The nonobvious choices in g are therefore: where along the vertical “ pipeline C reverse; in what order to combine the three products and ; and with which operation (add 2 g of machine- or exclusive-or) should each combining be done? We tested these choices exhaustively before isit website settling on the algorithm shown in the fi gure. ica). N It remains to determine the smallest number of iterations that we can get away with. it The minimum meaningful N is evidently two, since a single iteration simply moves one it 32-bit word without altering it. One can use the constants C and C to help determine an 2 1 N appropriate : When and (an intentionally very poor choice), the =2 =0 C N = C 2 it it 1 generator fails several tests of randomness by easily measurable, though not overwhelming, amounts. When N , on the other hand, or with =2 but with the constants N =4 it it C nonsparse, we have been unable to fi nd any statistical deviation from randomness in ,C 1 2 9 10 sequences of up to fl r derived from this scheme. The combined strength oating numbers i of N should therefore give sequences that are random to tests =4 and nonsparse C ,C 1 2 it even far beyond those that we have actually tried. These are our recommended conservative

326 302 Chapter 7. Random Numbers XOR C 1 Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 2 2 • lo hi lo hi NOT + reverse half-words XOR C 2 + Figure 7.5.2. The nonlinear function g used by the routine psdes . parameter values, notwithstanding the fact that N =2 (which is, of course, twice as fast) it has no nonrandomness discernible (by us). Implementation of these ideas is straightforward. The following routine is not quite strictly portable, since it assumes that unsigned long integers are 32-bits, as is the case integers would be in on most machines. However, there is no reason to believe that longer C any way inferior (with suitable extensions of the constants ,C ). C does not provide a 1 2 convenient, portable way to divide a long integer into half words, so we must use a combination of masking ( & 0xffff ) with left- and right-shifts by 16 bits ( <<16 and >>16 ). On some machines the half-word extraction could be made faster by the use of C ’ s union construction, machines. big-endian but this would generally not be portable between “ ” ” and “ little-endian (Big- and little-endian refer to the order in which the bytes are stored in a word.) #define NITER 4 g of machine- void psdes(unsigned long *lword, unsigned long *irword) isit website “Pseudo-DES” hashing of the 64-bit word (lword,irword) . Both 32-bit arguments are re- ica). turned hashed on all bits. { unsigned long i,ia,ib,iswap,itmph=0,itmpl=0; static unsigned long c1[NITER]={ 0xbaa96887L, 0x1e17d32cL, 0x03bcdc3cL, 0x0f33d1b2L}; static unsigned long c2[NITER]={ 0x4b0f3b58L, 0xe874f0c3L, 0x6955c5a6L, 0x55a7ca46L}; for (i=0;i

327 7.5 Random Sequences Based on Data Encryption 303 ia=(iswap=(*irword)) ^ c1[i]; c1 and (below) The bit-rich constants c2 guarantee lots of nonlinear mix- itmpl = ia & 0xffff; ing. itmph = ia >> 16; ib=itmpl*itmpl+ ~(itmph*itmph); *irword=(*lword) ^ (((ia = (ib >> 16) | ((ib & 0xffff) << 16)) ^ c2[i])+itmpl*itmph); *lword=iswap; } http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin } , listed below, uses psdes to generate uniform random deviates. We The routine ran4 idum adopt the convention that a negative value of the argument sets the left 32-bit word, while a positive value i sets the right 32-bit word, returns the i th random deviate, and increments to i +1 idum fi ning many different sequences . This is no more than a convenient way of de (negative values of ), but still with random access to each sequence (positive values of idum ). For getting a fl idum oating-point number from the 32-bit integer, we like to do it by the 7.1, above. The hex constants 3F800000 and 007FFFFF § masking trick described at the end of are the appropriate ones for computers using the IEEE representation for 32-bit oating-point fl numbers (e.g., IBM PCs and most UNIX workstations). For DEC VAXes, the correct hex constants are, respectively, 00004080 and FFFF007F. For greater portability, you can instead fl oating number by making the (signed) 32-bit integer nonnegative (typically, you construct a − 31 31 2 add exactly oating constant (typically 2 . fl if it is negative) and then multiplying it by a ). An interesting, and sometimes useful, feature of the routine ran4 , below, is that it allows n fi rst generating random access to the th random value in a sequence, without the necessity of values n − 1 . This property is shared by any random number generator based on hashing ··· 1 (the technique of mapping data keys, which may be highly clustered in value, approximately [5,6] uniformly into a storage address space) . One might have a simulation problem in which some certain rare situation becomes recognizable by its consequences only considerably after it has occurred. One may wish to restart the simulation back at that occurrence, using identical random values but, say, varying some other control parameters. The relevant question might then be something like “ what random numbers were used in cycle number 337098901? ” It might already be cycle number 395100273 before the question comes up. Random generators based on recursion, rather than hashing, cannot easily answer such a question. float ran4(long *idum) Returns a uniform random deviate in the range 0.0 to 1.0 ,generated by pseudo-DES (DES- (idums,idum) ,where idums was set by a previous call with like) hashing of the 64-bit word negative idum . Also increments idum . Routine can be used to generate a random sequence by successive calls ,leaving th idum n unaltered between calls; or it can randomly access the = n . Different sequences are initialized by calls with idum deviate in a sequence by calling with differing negative values of idum . { void psdes(unsigned long *lword, unsigned long *irword); unsigned long irword,itemp,lword; static long idums = 0; The hexadecimal constants jflone and jflmsk below are used to produce a floating number between 1. and 2. by bitwise masking. They are machine-dependent. See text. #if defined(vax) || defined(_vax_) || defined(__vax__) || defined(VAX) g of machine- static unsigned long jflone = 0x00004080; isit website static unsigned long jflmsk = 0xffff007f; ica). #else static unsigned long jflone = 0x3f800000; static unsigned long jflmsk = 0x007fffff; #endif if (*idum < 0) { Reset idums and prepare to return the first deviate in its sequence. idums = -(*idum); *idum=1; } irword=(*idum); lword=idums;

328 304 Chapter 7. Random Numbers “Pseudo-DES” encode the words. psdes(&lword,&irword); itemp=jflone | (jflmsk & irword); Mask to a floating number between 1 and 2. ++(*idum); return (*(float *)&itemp)-1.0; Subtraction moves range to 0. to 1. } The accompanying table gives data for verifying that ran4 and psdes work correctly readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin ran4 unless you are able to reproduce the on your machine. We do not advise the use of ran4 is about 4 times slower than ran0 ( § hex values shown. Typically, 7.1), or about 3 times slower than . ran1 psdes Values for Verifying the Implementation of before psdes call after psdes call (hex) ran4(idum) idum irword lword irword VA X PC lword –1 1 1 604D1DCE 509C0C23 0.275898 0.219120 A66CB41A 99 0.208204 99 1 0.849246 D97F8571 –99 1 7822309D 64300984 0.034307 0.375290 99 99 99 99 D7F376F0 59BA89EB 0.838676 0.457334 − Successive calls to with arguments − 1, 99, ran4 99, and 99 should produce exactly the lword and irword values shown. Masking conversion to a returned floating random value is allowed to be machine dependent; values for VAX and PC are shown. CITED REFERENCES AND FURTHER READING: Data Encryption Standard , 1977 January 15, Federal Information Processing Standards Publi- cation, number 46 (Washington: U.S. Department of Commerce, National Bureau of Stan- dards). [1] Guidelines for Implementing and Using the NBS Data Encryption Standard , 1981 April 1, Federal Information Processing Standards Publication, number 74 (Washington: U.S. Department of Commerce, National Bureau of Standards). [2] Validating the Correctness of Hardware Implementations of the NBS Data Encryption Stan- dard , 1980, NBS Special Publication 500–20 (Washington: U.S. Department of Commerce, National Bureau of Standards). [3] Meyer, C.H. and Matyas, S.M. 1982, Cryptography: A New Dimension in Computer Data Security (New York: Wiley). [4] Knuth, D.E. 1973, Sorting and Searching ,vol.3of The Art of Computer Programming (Reading, MA: Addison-Wesley), Chapter 6. [5] Design and Analysis of Coalesced Hashing Vitter, J.S., and Chen, W-C. 1987, (New York: Oxford University Press). [6] g of machine- isit website ica). 7.6 Simple Monte Carlo Integration Inspirations for numerical methods can spring from unlikely sources. “Splines” first were flexible strips of wood used by draftsmen. “Simulated annealing” (we shall see in § 10.9) is rooted in a thermodynamic analogy. And who does not feel at least a faint echo of glamor in the name “Monte Carlo method”?

329 7.6 Simple Monte Carlo Integration 305 random points, uniformly distributed in a multidimen- N Suppose that we pick sional volume V . Call them x ,...,x . Then the basic theorem of Monte Carlo N 1 integration estimates the integral of a function f over the multidimensional volume, √ ∫ 2 2 〉−〈 〉 〈 f f ) 7.6.1 ( fdV ≈ V 〈 f 〉± V Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer N N sample points, Here the angle brackets denote taking the arithmetic mean over the N N ∑ ∑ 〉 〈 1 1 2 2 f 〉≡ 〈 ≡ ) x ( ) f f 7.6.2 )( x f ( i i N N i i =1 =1 The “plus-or-minus” term in (7.6.1) is a one standard deviation error estimate for the integral, not a rigorous bound; further, there is no guarantee that the error is distributed as a Gaussian, so the error term should be taken only as a rough indication of probable error. Suppose that you want to integrate a function g over a region W that is not easy to sample randomly. For example, W might have a very complicated shape. easily be sampled and that can No problem. Just find a region V that includes W (Figure 7.6.1), and then define g for points in W and equal to zero to be equal to f W (but still inside the sampled V ). You want to try to make for points outside of enclose W as closely as possible, because the zero values of f V will increase the error estimate term of (7.6.1). And well they should: points chosen outside of W have no information content, so the effective value of , the number of points, is N reduced. The error estimate in (7.6.1) takes this into account. General purpose routines for Monte Carlo integration are quite complicated (see § 7.8), but a worked example will show the underlying simplicity of the method. Suppose that we want to find the weight and the position of the center of mass of an object of complicated shape, namely the intersection of a torus with the edge of a large box. In particular let the object be defined by the three simultaneous conditions ) ( √ 2 2 2 2 + 1( + y 7.6.3 − 3 ) x ≤ z (torus centered on the origin with major radius =4 , minor radius =2 ) 1 x ≥ ) y ≥− 3( 7.6.4 (two faces of the box, see Figure 7.6.2). Suppose for the moment that the object has a constant density ρ . We want to estimate the following integrals over the interior of the complicated g of machine- isit website object: ica). ∫ ∫ ∫ ∫ yρdxdydz xρdxdydz ρdxdydz zρdxdydz ( 7.6.5 ) The coordinates of the center of mass will be the ratio of the latter three integrals (linear moments) to the first one (the weight). In the following fragment, the region V , enclosing the piece-of-torus W , is the y rectangular box extending from 1 to 4 in , − 3 to4in x , and − 1 to 1 in z .

330 306 Chapter 7. Random Numbers area A readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin fdx ∫ Figure 7.6.1. Monte Carlo integration. Random points are chosen within the area A . The integral of the f A multiplied by the fraction of random points that fall below the function is estimated as the area of curve f .Re fi nements on this procedure can improve the accuracy of the method; see text. y 4 2 x 4 1 02 g of machine- isit website ica). Figure 7.6.2. Example of Monte Carlo integration (see text). The region of interest is a piece of a torus, bounded by the intersection of two planes. The limits of integration of the region cannot easily be written in analytically closed form, so Monte Carlo is a useful technique.

331 7.6 Simple Monte Carlo Integration 307 #include "nrutil.h" ... n=... Set to the number of sample points desired. Set to the constant value of the density. den=... sw=swx=swy=swz=0.0; Zero the various sums to be accumulated. varw=varx=vary=varz=0.0; vol=3.0*7.0*2.0; Volume of the sampled region. for(j=1;j<=n;j++) { http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Pick a point randomly in the sampled re- x=1.0+3.0*ran2(&idum); gion. y=(-3.0)+7.0*ran2(&idum); z=(-1.0)+2.0*ran2(&idum); if (z*z+SQR(sqrt(x*x+y*y)-3.0) < 1.0) { Is it in the torus? sw += den; If so, add to the various cumulants. swx += x*den; swy += y*den; swz += z*den; varw += SQR(den); varx += SQR(x*den); vary += SQR(y*den); varz += SQR(z*den); } } w=vol*sw/n; The values of the integrals (7.6.5), x=vol*swx/n; y=vol*swy/n; z=vol*swz/n; dw=vol*sqrt((varw/n-SQR(sw/n))/n); and their corresponding error estimates. dx=vol*sqrt((varx/n-SQR(swx/n))/n); dy=vol*sqrt((vary/n-SQR(swy/n))/n); dz=vol*sqrt((varz/n-SQR(swz/n))/n); A change of variable can often be extremely worthwhile in Monte Carlo integration. Suppose, for example, that we want to evaluate the same integrals, but for a piece-of-torus whose density is a strong function of z , in fact varying according to z 5 7.6.6 ) ( x,y,z ρ )= e ( One way to do this is to put the statement den=exp(5.0*z); inside the if (...) block, just before den is fi rst used. This will work, but it is a poor way to proceed. Since (7.6.6) falls so rapidly to zero as z decreases (down ), most sampled points contribute almost nothing to the sum 1 to its lower limit − g of machine- of the weight or moments. These points are effectively wasted, almost as badly as isit website W those that fall outside of the region . A change of variable, exactly as in the ica). transformation methods of § 7.2, solves this problem. Let 1 1 z 5 z 5 s e 7.6.7 ln(5 ) )( = dz ,z = so that s e = ds 5 5 Then ρdz = ds , and the limits − 1

332 308 Random Numbers Chapter 7. #include "nrutil.h" ... n=... Set to the number of sample points desired. sw=swx=swy=swz=0.0; varw=varx=vary=varz=0.0; s Interval of to be random sampled. ss=0.2*(exp(5.0)-exp(-5.0)) x,y,s vol=3.0*7.0*ss Volume in -space. for(j=1;j<=n;j++) { http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin x=1.0+3.0*ran2(&idum); y=(-3.0)+7.0*ran2(&idum); s s=0.00135+ss*ran2(&idum); Pick a point in . z=0.2*log(5.0*s); Equation (7.6.7). if (z*z+SQR(sqrt(x*x+y*y)-3.0) < 1.0) { sw += 1.0; Density is 1, since absorbed into definition s . of swx += x; swy += y; swz += z; varw += 1.0; varx += x*x; vary += y*y; varz += z*z; } } w=vol*sw/n; The values of the integrals (7.6.5), x=vol*swx/n; y=vol*swy/n; z=vol*swz/n; dw=vol*sqrt((varw/n-SQR(sw/n))/n); and their corresponding error estimates. dx=vol*sqrt((varx/n-SQR(swx/n))/n); dy=vol*sqrt((vary/n-SQR(swy/n))/n); dz=vol*sqrt((varz/n-SQR(swz/n))/n); If you think for a minute, you will realize that equation (7.6.7) was useful only z 5 ) was both integrable e because the part of the integrand that we wanted to eliminate ( analytically, and had an integral that could be analytically inverted. (Compare § 7.2.) In general these properties will not hold. Question: What then? Answer: Pull out “ best ” factor that can be integrated and inverted. The criterion of the integrand the “ best ” is to try to reduce the remaining integrand to a function that is as close for as possible to constant. The limiting case is instructive: If you manage to make the integrand f exactly constant, and if the region V , of known volume, exactly encloses the desired region that you compute will be exactly its constant value, and the f W , then the average of error estimate in equation (7.6.1) will exactly vanish. You will, in fact, have done the integral exactly, and the Monte Carlo numerical evaluations are super fl uous. So, that you are able to make backing off from the extreme limiting case, to the extent f g of machine- approximately constant by change of variable, and to the extent that you can sample a isit website region only slightly larger than W , you will increase the accuracy of the Monte Carlo ica). integral. This technique is generically called in the literature. reduction of variance The fundamental disadvantage of simple Monte Carlo integration is that its accuracy increases only as the square root of N , the number of sampled points. If your accuracy requirements are modest, or if your computer budget is large, then the technique is highly recommended as one of great generality. In the next two sections we will see that there are techniques available for “ breaking the square root of N barrier ” and achieving, at least in some cases, higher accuracy with fewer function evaluations.

333 7.7 Quasi- (that is, Sub-) Random Sequences 309 CITED REFERENCES AND FURTHER READING: Hammersley, J.M., and Handscomb, D.C. 1964, Monte Carlo Methods (London: Methuen). (Oxford: Pergamon). Shreider, Yu. A. (ed.) 1966, The Monte Carlo Method Sobol’, I.M. 1974, (Chicago: University of Chicago Press). The Monte Carlo Method Kalos, M.H., and Whitlock, P.A. 1986, Monte Carlo Methods (New York: Wiley). http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) 7.7 Quasi- (that is, Sub-) Random Sequences N points uniformly randomly in an n - We have just seen that choosing dimensional space leads to an error term in Monte Carlo integration that decreases √ N . In essence, each new point sampled adds linearly to an accumulated sum as 1 / that will become the function average, and also linearly to an accumulated sum of squares that will become the variance (equation 7.6.2). The estimated error comes − 1 / 2 . N from the square root of this variance, hence the power Just because this square root convergence is familiar does not, however, mean that it is inevitable. A simple counterexample is to choose sample points that lie on a Cartesian grid, and to sample each grid point exactly once (in whatever order). The Monte Carlo method thus becomes a deterministic quadrature scheme — albeit − 1 (even faster a simple one — whose fractional error decreases at least as fast as N if the function goes to zero smoothly at the boundaries of the sampled region, or is periodic in the region). The trouble with a grid is that one has to decide in advance how fine it should be. One is then committed to completing all of its sample points. With a grid, it is not convenient to “sample until ” some convergence or termination criterion is met. One might ask if there is not some intermediate scheme, some way to pick sample points “at random,” yet spread out in some self-avoiding way, avoiding the chance clustering that occurs with uniformly random points. A similar question arises for tasks other than Monte Carlo integration. We might want to search an n -dimensional space for a point where some (locally computable) condition holds. Of course, for the task to be computationally meaningful, there had better be continuity, so that the desired condition will hold in some finite n - how large that neighborhood dimensional neighborhood. We may not know a priori is, however. We want to “sample until ” the desired point is found, moving smoothly to finer scales with increasing samples. Is there any way to do this that is better than uncorrelated, random samples? g of machine- The answer to the above question is “yes.” Sequences of n -tuples that fill isit website n -space more uniformly than uncorrelated random points are called quasi-random ica). sequences . That term is somewhat of a misnomer, since there is nothing “random” about quasi-random sequences: They are cleverly crafted to be, in fact, sub- random. The sample points in a quasi-random sequence are, in a precise sense, “maximally avoiding” of each other. [1] . In one dimension, the Halton’s sequence A conceptually simple example is as a in the sequence is obtained by the following steps: (i) Write j j th number H j number in base b , where b is some prime. (For example j =17 in base b =3 is 122.) (ii) Reverse the digits and put a radix point (i.e., a decimal point base b ) in front of

334 310 Chapter 7. Random Numbers . . . . . . . . . . . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 .8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v . . . . . . . . .6 .6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 .4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 .2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 . . . . . . . . 0.2.4.6.81 0.2.4.6.81 points 129 to 512 points 1 to 128 . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 . . . . . . .8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 . . . . . . . . . . .6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .4 . . . . . .4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 . . . . .2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . . . . . 0 . . . . . . . . . . . . . . . . . 0.2.4.6.81 0.2.4.6.81 points 1 to 1024 points 513 to 1024 Figure 7.7.1. First 1024 points of a two-dimensional Sobol ’ sequence. The sequence is generated number-theoretically, rather than randomly, so successive points at any stage “ know ” how to fi ll in the gaps in the previously generated distribution. . To get a H the sequence. (In the example, we get 0 . 221 base 3.) The result is j sequence of n -tuples in n -space, you make each component a Halton sequence with b a different prime base primes are used. . Typically, the fi rst n s sequence works: Every time the number of It is not hard to see how Halton ’ g of machine- s digit-reversed fraction becomes a factor of j increases by one place, j ’ digits in isit website b fi ner-meshed. Thus the process is one of fi lling in all the points on a sequence ica). of fi ner Cartesian grids — and in a kind of maximally spread-out order ner and fi j controls the most on each grid (since, e.g., the most rapidly changing digit in fi cant digit of the fraction). signi Other ways of generating quasi-random sequences have been suggested by [2] provide a good review , Niederreiter, and others. Bratley and Fox Faure, Sobol ’ [3] sequence and references, and discuss a particularly ef fi cient variant of the Sobol ’ [4] suggested by Antonov and Saleev . It is this Antonov-Saleev variant whose implementation we now discuss.

335 7.7 Quasi- (that is, Sub-) Random Sequences 311 Degree Primitive Polynomials Modulo 2* 1 0 (i.e., x +1 ) 2 ) 1 (i.e., x 2 + x +1 2 3 3 ) 3 1, 2 (i.e., x +1 + x +1 and x + x Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer 4 4 3 +1 x +1 and x 1, 4 (i.e., + 4 + x ) x 5 2, 4, 7, 11, 13, 14 1, 13, 16, 19, 22, 25 6 7 1, 4, 7, 8, 14, 19, 21, 28, 31, 32, 37, 41, 42, 50, 55, 56, 59, 62 8 14, 21, 22, 38, 47, 49, 50, 52, 56, 67, 70, 84, 97, 103, 115, 122 9 8, 13, 16, 22, 25, 44, 47, 52, 55, 59, 62, 67, 74, 81, 82, 87, 91, 94, 103, 104, 109, 122, 124, 137, 138, 143, 145, 152, 157, 167, 173, 176, 181, 182, 185, 191, 194, 199, 218, 220, 227, 229, 230, 234, 236, 241, 244, 253 10 4, 13, 19, 22, 50, 55, 64, 69, 98, 107, 115, 121, 127, 134, 140, 145, 152, 158, 161, 171, 181, 194, 199, 203, 208, 227, 242, 251, 253, 265, 266, 274, 283, 289, 295, 301, 316, 319, 324, 346, 352, 361, 367, 382, 395, 398, 400, 412, 419, 422, 426, 428, 433, 446, 454, 457, 472, 493, 505, 508 *Expressed as a decimal integer representing the interior bits (that is, omitting the high-order bit and the unit bit). The Sobol ’ sequence generates numbers between zero and one directly as binary fractions w bits, from a set of w of length V special binary fractions, ,i =1 , 2 ,...,w , called direction i numbers . In Sobol ’ s original method, the j th number X is generated by XORing (bitwise j exclusive or) together the set of V ” ’ s satisfying the criterion on i , “ the i th bit of j is nonzero. i j increments, in other words, different ones of the V As on different ’ s fl ash in and out of X j i time scales. V goes from alternates between being present and absent most quickly, while V 1 k k − 1 2 present to absent (or vice versa) only every steps. Antonov and Saleev ’ s contribution was to show that instead of using the bits of the integer j to select direction numbers, one could just as well use the bits of the Gray code of j , G ( j ) . (For a quick review of Gray codes, look at § 20.2.) differ in exactly one bit position, namely in the position of the ) Now G ( j +1) and G ( j j (adding a leading zero to j if necessary). A rightmost zero bit in the binary representation of -Antonov-Saleev number can be obtained from the consequence is that the j +1 st Sobol ’ j th by XORing it with a single V , namely with i the position of the rightmost zero bit in j . This i cient, as we shall see. fi makes the calculation of the sequence very ef g of machine- isit website fi Figure 7.7.1 plots the sequence. rst 1024 points generated by a two-dimensional Sobol ’ One sees that successive points do know ” about the gaps left previously, and keep fi lling “ ica). them in, hierarchically. We have deferred to this point a discussion of how the direction numbers V are generated. i Some nontrivial mathematics is involved in that, so we will content ourself with a cookbook ’ sequence (or component of an n summary only: Each different Sobol -dimensional sequence) is based on a different primitive polynomial over the integers modulo 2, that is, a polynomial whose coef fi cients are either 0 or 1, and which generates a maximal length shift register sequence. (Primitive polynomials modulo 2 were used in § 7.4, and are further discussed in § 20.3.) Suppose P is such a polynomial, of degree q , q q 2 − 1 q − x P = + x a 7.7.1 x ( a + + ··· + a +1 x ) 1 − q 1 2

336 312 Random Numbers Chapter 7. sobseq Initializing Values Used in Degree Polynomial Starting Values (3) 1 0 1 (5) (15) ... 2 1 1 1 (7) (11) ... http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v 1 1 3 7 (5) ... 3 3 1 3 3 (15) ... 2 1 3 1 4 13 ... 1 4 1 1 5 9 ... 4 able, but are forced by the required recurrence Parenthesized values are not freely speci fi for this degree. q fi ne a sequence of integers M by the De -term recurrence relation, i q 2 1 q − ⊕···⊕ M ⊕ M ) ⊕ 2 7.7.2 a (2 M ⊕ a M )( 2 =2 a M M − − q +1 − q − 1 i 2 1 i i − q i 1 i − q i 2 ,...,M M . The starting values for this recurrence are that Here bitwise XOR is denoted by ⊕ 1 q q ,..., , respectively. Then, the direction numbers can be arbitrary odd integers less than 2 2 V are given by i i M ) / 2 = i =1 ,...,w ( 7.7.3 V i i ≤ The accompanying table lists all primitive polynomials modulo 2 with degree 10 . q q and of 1 are predictably x Since the coef cients are either 0 or 1, and since the coef fi cients of fi 1, it is convenient to denote a polynomial by its middle coef fi cients taken as the bits of a binary x being more signi number (higher powers of cant bits). The table uses this convention. fi Turn now to the implementation of the Sobol ’ sequence. Successive calls to the function sobseq (after a preliminary initializing call) return successive points in an n -dimensional sequence based on the primitive polynomials in the table. As given, the routine Sobol ’ n fi rst is initialized for maximum w of 30 bits. These of 6 dimensions, and for a word length n MAXBIT ( ≡ w ) and MAXDIM , and by adding more parameters can be altered by changing ip (the primitive polynomials from the table), mdeg (their initializing data to the arrays degrees), and (the starting values for the recurrence, equation 7.7.2). A second table, iv above, elucidates the initializing data in the routine. #include "nrutil.h" #define MAXBIT 30 #define MAXDIM 6 void sobseq(int *n, float x[]) MAXDIM n is negative, internally initializes a set of MAXBIT direction numbers for each of When different Sobol’ sequences. When n is positive (but ≤ MAXDIM ), returns as the vector x[1..n] g of machine- the next values from n of these sequences. ( n must not be changed between initializations.) isit website { ica). int j,k,l; unsigned long i,im,ipp; static float fac; static unsigned long in,ix[MAXDIM+1],*iu[MAXBIT+1]; static unsigned long mdeg[MAXDIM+1]={0,1,2,3,3,4,4}; static unsigned long ip[MAXDIM+1]={0,0,1,1,2,1,4}; static unsigned long iv[MAXDIM*MAXBIT+1]={ 0,1,1,1,1,1,1,3,1,3,3,1,1,5,7,7,3,3,5,15,11,5,15,13,9}; if (*n < 0) { Initialize, don’t return a vector. for (k=1;k<=MAXDIM;k++) ix[k]=0;

337 7.7 Quasi- (that is, Sub-) Random Sequences 313 in=0; if (iv[1] != 1) return; fac=1.0/(1L << MAXBIT); for (j=1,k=0;j<=MAXBIT;j++,k+=MAXDIM) iu[j] = &iv[k]; To allo wboth 1D and 2D addressing. for (k=1;k<=MAXDIM;k++) { for (j=1;j<=mdeg[k];j++) iu[j][k] <<= (MAXBIT-j); Stored values only require normalization. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. for (j=mdeg[k]+1;j<=MAXBIT;j++) { Use the recurrence to get other val- ipp=ip[k]; ues. i=iu[j-mdeg[k]][k]; i ^= (i >> mdeg[k]); for (l=mdeg[k]-1;l>=1;l--) { if (ipp & 1) i ^= iu[j-l][k]; ipp >>= 1; } iu[j][k]=i; } } } else { Calculate the next vector in the se- im=in++; quence. for (j=1;j<=MAXBIT;j++) { Find the rightmost zero bit. if (!(im & 1)) break; im >>= 1; } if (j > MAXBIT) nrerror("MAXBIT too small in sobseq"); im=(j-1)*MAXDIM; for (k=1;k<=IMIN(*n,MAXDIM);k++) { XOR the appropriate direction num- ber into each component of the ix[k] ^= iv[im+k]; vector and convert to a floating x[k]=ix[k]*fac; number. } } } How good is a Sobol ’ sequence, anyway? For Monte Carlo integration of a smooth dimensions, the answer is that the fractional error will decrease with , the function in N n n ) (ln N number of samples, as 1 /N /N , i.e., almost as fast as . As an example, let us integrate a function that is nonzero inside a torus (doughnut) in three-dimensional space. If the major radius of the torus is R , the minor radial coordinate r is de fi ned by 0 ( ) / 1 2 2 1 2 2 / 2 2 = x [( r + R y ] ( + z ) − 7.7.4 ) 0 Let us try the function  ( ) 2  πr 1+cos r

338 314 Random Numbers Chapter 7. .1 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North Amer Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v − 1/2 N ∝ .01 − 2/3 N ∝ pseudo-random, hard boundary fractional accuracy of integral − 1 pseudo-random, soft boundary N ∝ quasi-random, hard boundary .001 quasi-random, soft boundary 5 1000 10000 100 10 N number of points Figure 7.7.2. Fractional accuracy of Monte Carlo integrations as a function of number of points sampled, for two different integrands and two different methods of choosing random points. The quasi-random Sobol ’ sequence converges much more rapidly than a conventional pseudo-random sequence. Quasi- ) than when it has step ” random sampling does better when the integrand is smoot