# 19.4: Further Details on Implementing Multiconfigurational Methods

- Page ID
- 70437

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## The MCSCF Method

The simultaneous optimization of the LCAO-MO and CI coefficients performed within an MCSCF calculation is a quite formidable task. The variational energy functional is a quadratic function of the CI coefficients, and so one can express the stationary conditions for these variables in the secular form:

\[ \sum\limits_J\text{H}_{I,J}C_J = \text{E C}_I . \nonumber \]

However, E is a quartic function of the C\(_{\nu ,i}\) coefficients because each matrix element \( \Phi_I \big| \text{ H } \big| \Phi_I \rangle \) involves one- and two-electron integrals over the mos \(\phi_i\), and the two-electron integrals depend quartically on the \(C_{\nu ,i}\) coefficients. The stationary conditions with respect to these \(C_{\nu ,i}\) parameters must be solved iteratively because of this quartic dependence.

It is well known that minimization of a function (E) of several non-linear parameters (the \(C_{\nu ,i}\)) is a difficult task that can suffer from poor convergence and may locate local rather than global minima. In an MCSCF wavefunction containing many CSFs, the energy is only weakly dependent on the orbitals that are weakly occupied (i.e., those that appear in CSFs with small \(C_I\) values); in contrast, E is strongly dependent on the \(C_{\nu ,i}\) coefficients of those orbitals that appear in the CSFs with larger \(C_I\) values. One is therefore faced with minimizing a function of many variables (there may be as many \(C_{\nu ,i}\) as the square of the number of orbital basis functions) that depends strongly on several of the variables and weakly on many others. This is a very difficult job.

For these reasons, in the MCSCF method, the number of CSFs is usually kept to a small to moderate number (e.g., a few to several hundred) chosen to describe essential correlations (i.e., configuration crossings, proper dissociation) and important dynamical correlations (those electron-pair correlations of angular, radial, left-right, etc. nature that arise when low-lying 'virtual' orbitals are present). In such a compact wavefunction, only spin-orbitals with reasonably large occupations (e.g., as characterized by the diagonal elements of the one-particle density matrix \(\gamma_{i,}\)j) appear. As a result, the energy functional is expressed in terms of variables on which it is strongly dependent, in which case the nonlinear optimization process is less likely to be pathological.

Such a compact MCSCF wavefunction is designed to provide a good description of the set of strongly occupied spin-orbitals and of the CI amplitudes for CSFs in which only these spin-orbitals appear. It, of course, provides no information about the spin-orbitals that are not used to form the CSFs on which the MCSCF calculation is based. As a result, the MCSCF energy is invariant to a unitary transformation among these 'virtual' orbitals.

In addition to the references mentioned earlier in ACP and MTC, the following papers describe several of the advances that have been made in the MCSCF method, especially with respect to enhancing its rate and range of convergence: E. Dalgaard and P. Jørgensen, J. Chem. Phys. 69 , 3833 (1978); H. J. Aa. Jensen, P. Jørgensen, and H. åÅgren, J. Chem. Phys. 87 , 457 (1987); B. H. Lengsfield, III and B. Liu, J. Chem. Phys. 75 , 478 (1981).

## The Configuration Interaction Method

In the Configuration Interaction (CI) method, one usually attempts to realize a high-level treatment of electron correlation. A set of orthonormal molecular orbitals are first obtained from an SCF or MCSCF calculation (usually involving a small to moderate list of CSFs). The LCAO-MO coefficients of these orbitals are **no longer considered** as variational parameters in the subsequent CI calculation; only the \(C_I\) coefficients are to be further optimized.

The CI wavefunction

\[ \psi = \sum\limits_J C_J \Phi_J \nonumber \]

is most commonly constructed from CSFs \(\Phi_J\) that include:

- All of the CSFs in the SCF (in which case only a single CSF is included) or MCSCF wavefunction that was used to generate the molecular orbitals \(\phi_i\). This set of CSFs are referred to as spanning the '
**reference space**' of the subsequent CI calculation, and the particular combination of these CSFs used in this orbital optimization (i.e., the SCF or MCSCF wavefunction) is called the**reference function.** - CSFs that are generated by carrying out single, double, triple, etc. level 'excitations' (i.e., orbital replacements ) relative to reference CSFs. CI wavefunctions limited to include contributions through various levels of excitation (e.g., single, double, etc. ) are denoted S (singly excited), D (doubly), SD ( singly and doubly), SDT (singly, doubly, and triply), and so on.

The orbitals from which electrons are removed and those into which electrons are excited can be restricted to focus attention on correlations among certain orbitals. For example, if excitations out of core electrons are excluded, one computes a total energy that contains no correlation corrections for these core orbitals. Often it is possible to so limit the nature of the orbital excitations to focus on the energetic quantities of interest (e.g., the CC bond breaking in ethane requires correlation of the \(\sigma_{CC}\) orbital but the 1s Carbon core orbitals and the CH bond orbitals may be treated in a non-correlated manner).

Clearly, the number of CSFs included in the CI calculation can be far in excess of the number considered in typical MCSCF calculations; CI wavefunctions including 5,000 to 50,000 CSFs are routinely used, and functions with one to **several million** CSFs are within the realm of practicality (see, for example, J. Olsen, B. Roos, Poul Jørgensen, and H. J. Aa. Jensen, J. Chem. Phys. 89 , 2185 (1988) and J. Olsen, P. Jørgensen, and J. Simons, Chem. Phys. Letters 169 , 463 (1990)).

The need for such large CSF expansions should not come as a surprise once one considers that (i) each electron pair requires **at least** two CSFs (let us say it requires P of them, on average, a dominant one and P-1 others which are doubly excited) to form polarized orbital pairs, (ii) there are of the order of N(N-1)/2 = X electron pairs in an atom or molecule containing N electrons, and (iii) that the number of terms in the CI wavefunction scales as \(P^X\). So, for an H\(_2\)O molecule containing ten electrons, there would be P\(^{55}\) terms in the CI expansion. This is 3.6 x10\(^{16}\) terms if P=2 and 1.7 x10\(^{26}\) terms if P=3. Undoubtedly, this is an over estimate of the number of CSFs needed to describe electron correlation in H\(_2\)O, but it demonstrates how rapidly the number of CSFs can grow with the number of electrons in the system.

The \(H_{I,J}\) matrices that arise in CI calculations are evaluated in terms of one- and two- electron integrals over the molecular orbitals using the equivalent of the Slater-Condon rules. For large CI calculations, the full \(H_{I,J}\) matrix is not actually evaluated and stored in the computer's memory (or on its disk); rather, so-called 'direct CI' methods (see the article by Roos and Siegbahn in MTC) are used to compute and immediately sum contributions to the sum \( \sum\limits_J \text{H}_{I,J}C_I \) in terms of integrals, density matrix elements, and approximate values of the \(C_J\) amplitudes. Iterative methods (see, for example, E. R. Davidson, J. Comput. Phys. 17 , 87 (1975)), in which approximate values for the \(C_J\) coefficients and energy E are refined through sequential application of \( \sum\limits_I\text{H}_{I,J} \) to the preceding estimate of the \(C_J\) vector, are employed to solve these large CI matrix eigenvalue problems.

## The MPPT/MBPT Method

In the MPPT/MBPT method, once the reference CSF is chosen and the SCF orbitals belonging to this CSF are determined, the wavefunction \(\psi\) and energy E are determined in an order-by-order manner. This is one of the primary strengths of the MPPT/MBPT technique; it does not require one to make further (potentially arbitrary) choices once the basis set and dominant (SCF) configuration are specified. In contrast to the MCSCF and CI treatments, one need not make choices of CSFs to include in or exclude from \(\psi\). The MPPT/MBPT perturbation equations determine what CSFs must be included through any particular order.

For example, the first-order wavefunction correction \(\psi^1 \text {(i.e., } \psi = \Phi + \psi^1\) through first order) is given by:

\[ \begin{align} \psi^1 &= -\sum\limits_{i<j,m>n} \dfrac{\langle \Phi_{i,j}^{m,n}\big| \text{ H - H}^0 \big| \Phi \rangle}{\epsilon_m - \epsilon_i + \epsilon_n - \epsilon_j} \bigg| \Phi_{i,j}^{m,n} \rangle \\ &= - \sum\limits_{i<j,M<n} \dfrac{\langle i,j \big| g \big| m,n \rangle - \langle i,j \big| g \big| n,m \rangle}{\epsilon_m - \epsilon_i + \epsilon_n - \epsilon_j}\bigg| \Phi_{i,j}^{m,n} \rangle \end{align} \nonumber \]

where the SCF orbital energies are denoted \(\epsilon_k \text{ and } \Phi_{i,j}^{m,n}\) represents a CSF that is **doubly excited** relative to \Phi. Thus, only doubly **excited** CSFs contribute to the **first-order** **wavefunction** ; as a result, the energy E is given through second order as:

\[ \begin{align} E &= \langle \big| \text{H}^0 \big| \Phi \rangle + \langle \Phi \Big| \text{ H - H}^0 \big| \Phi \rangle + \langle \Phi \big| \text{ H - H}^0 \big| \psi^1 \rangle \\ &= \langle \Phi \big| \text{ H } \big| \Phi \rangle - \sum\limits_{i<j,m<n} \dfrac{\big| \langle \Phi_{i,j}^{m,n} \big| \text{ H - H}^0 \big| \Phi \rangle \big|^2}{\epsilon_m - \epsilon_i + \epsilon_n -\epsilon_j} \\ &= \text{E}_{\text{SCF}} - \sum\limits_{i<j,m<n} \dfrac{\big| \langle i,j \big| g \big| m,n \rangle - \langle i,j \big| g \big| n,m \rangle \big|^2}{\epsilon_m - \epsilon_i + \epsilon_n - \epsilon_j} \\ &= \text{E}^0 + \text{E}^1 + \text{E}^2. \end{align} \nonumber \]

These contributions have been expressed, using the SC rules, in terms of the two-electron integrals \( \langle \text{i,j} \big| g \big|\text{m,n} \rangle \) coupling the excited spin-orbitals to the spin-orbitals from which electrons were excited as well as the orbital energy differences \( [ \epsilon_m - \epsilon_i + \epsilon_n - \epsilon_j ] \) accompanying such excitations. In this form, it becomes clear that major contributions to the correlation energy of the pair of occupied orbitals \(\phi_i \phi_j\) are made by double excitations into virtual orbitals \(\phi_m \phi_n\) that have large coupling (i..e., large \( \langle i,j \big| \text{ g }\big| m,n \rangle \) integrals) and small orbital energy gaps, \( [ \epsilon_m - \epsilon_i + \epsilon_n - \epsilon_j ] \).

In higher order corrections to the wavefunction and to the energy, contributions from CSFs that are singly, triply, etc. excited relative to \(\Phi\) appear, and additional contributions from the doubly excited CSFs also enter. It is relatively common to carry MPPT/MBPT calculations (see the references given above in Chapter 19.I.3 where the contributions of the Pople and Bartlett groups to the development of MPPT/MBPT are documented) through to third order in the energy (whose evaluation can be shown to require only \(\psi^0 \text{ and } \psi^1\)). The entire GAUSSIAN-8X series of programs, which have been used in thousands of important chemical studies, calculate E through third order in this manner.

In addition to being size-extensive and not requiring one to specify input beyond the basis set and the dominant CSF, the MPPT/MBPT approach is able to include the effect of **all** CSFs (that contribute to any given order) without having to find any eigenvalues of a matrix. This is an important advantage because matrix eigenvalue determination, which is necessary in MCSCF and CI calculations, requires computer time in proportion to the third power of the dimension of the \(\text{H}_{I,J}\) matrix. Despite all of these advantages, it is important to remember the primary disadvantages of the MPPT/MBPT approach; its energy is not an upper bound to the true energy and it may not be able to treat cases for which two or more CSFs have equal or nearly equal amplitudes because it obtains the amplitudes of all but the dominant CSF from perturbation theory formulas that assume the perturbation is 'small'.

## The Coupled-Cluster Method

The implementation of the Coupled-Cluster method begins much as in the MPPT/MBPT case; one selects a reference CSF that is used in the SCF process to generate a set of spin-orbitals to be used in the subsequent correlated calculation. The set of working equations of the Coupled-Cluster technique can be written explicitly by introducing the form of the so-called cluster operator T,

\[ \text{T} = \sum\limits_{i,m}t_i^m m^+ i + \sum\limits_{i,j,m,n}t_{i,j}^{m,n}m^+ n^+ \text{ j i + ...,} \nonumber \]

where the combination of operators \(m^+ i\) denotes creation of an electron in virtual spin orbital \(\phi_m\) and removal of an electron from occupied spin-orbital \(\phi_i\) to generate a single excitation. The operation \(m^+ n^+\) j i therefore represents a double excitation from \(\phi_i \phi_j\) to \(\phi_m \phi_n\). Expressing the cluster operator T in terms of the amplitudes \(t_i^m \text{, } t_{i,j}^{m,n} \text{, }\) etc. for singly, doubly, etc. excited CSFs, and expanding the exponential operators in

\[e^{-T} \text{ H } e^{T} \nonumber \]

one obtains:

\[ \langle \Phi_i^m \big| \text{ H + [H,T] + }\dfrac{1}{2} \text{[[H,T],T] + }\dfrac{1}{6}\text{[[[H,T],T],T] + }\dfrac{1}{24} \text{[[[[H,T],T],T],T] }\big| \Phi \rangle = 0; \nonumber \]

\[ \langle \Phi_{i,j}^{m,n} \big| \text{ H + [H,T] + }\dfrac{1}{2} \text{[[H,T],T] + }\dfrac{1}{6}\text{[[[H,T],T],T] + }\dfrac{1}{24} \text{[[[[H,T],T],T],T] }\big| \Phi \rangle = 0; \nonumber \]

\[ \langle \Phi_{i,j,k}^{m,n,p} \big| \text{ H + [H,T] + }\dfrac{1}{2} \text{[[H,T],T] + }\dfrac{1}{6}\text{[[[H,T],T],T] + }\dfrac{1}{24} \text{[[[[H,T],T],T],T] }\big| \Phi \rangle = 0; \nonumber \]

and so on for higher order excited CSFs. It can be shown, because of the one- and twoelectron operator nature of H, that the expansion of the exponential operators truncates exactly at the fourth power; that is terms such as [[[[[H,T],T],T],T],T] and higher commutators vanish identically (this is demonstrated in Chapter 4 of **Second Quantization Based Methods in Quantum Chemistry** , P. Jørgensen and J. Simons, Academic Press, New York (1981).

As a result, the exact Coupled-Cluster equations are **quartic equations** for the \(t_i^m \text{, } t_{i,j}^{m,n} \text{, }\) etc. amplitudes. Although it is a rather formidable task to evaluate all of the commutator matrix elements appearing in the above Coupled-Cluster equations, it can be and has been done (the references given above to Purvis and Bartlett are especially relevant in this context). The result is to express each such matrix element, via the Slater-Condon rules, in terms of one- and twoelectron integrals over the spin-orbitals used in determining \(\Phi\), including those in \(\Phi\) itself and the 'virtual' orbitals not in \(\Phi\).

In general, these quartic equations must then be solved in an iterative manner and are susceptible to convergence difficulties that are similar to those that arise in MCSCF-type calculations. In any such iterative process, it is important to start with an approximation (to the t amplitudes, in this case) which is reasonably close to the final converged result. Such an approximation is often achieved, for example, by neglecting all of the terms that are nonlinear in the t amplitudes (because these amplitudes are assumed to be less than unity in magnitude). This leads, for the Coupled-Cluster working equations obtained by projecting onto the doubly excited CSFs, to:

\[ \langle \text{ i j } \big| \text{ g } \big| \text{ m,n } \rangle ' + \left[ \epsilon_m -\epsilon_i +\epsilon_n -\epsilon_j \right] t_{i,j}^{m,n} + \sum\limits_{i',j',m',n'} \langle \Phi_{i,j}^{m,n} \big| \text{ H - H}^0 \big| \Phi_{i',j'}^{m',n'} \Phi_{i',j'}^{m',n'} \rangle t_{i',j'}^{m',n'} = 0, \nonumber \]

where the notation \( \frac{\langle \text{ i,j }\big| \text{ g }\big| \text{ m,n }\rangle '}{\epsilon_m -\epsilon_i +\epsilon_n -\epsilon_j} \) is used to denote the two-electron integral difference \(\rangle \text{ i,j } \big| \text{ g } \big| \text{ m,n } \rangle \text{ - } \langle \text{ i,j } \big| \text{ g } \big| \text{ n,m } \rangle\). If, in addition, the factors that couple different doubly excited CSFs are ignored (i.e., the sum over i',j',m',n') , the equations for the t amplitudes reduce to the equations for the CSF amplitudes of the first-order MPPT/MBPT wavefunction:

\[ t_{i,j}^{m,n} = - \dfrac{\langle \text{ i,j }\big| \text{ g } \big| \text{ m,n } \rangle '}{\epsilon_m -\epsilon_i +\epsilon_n -\epsilon_j} \nonumber \]

As Bartlett and Pople have both demonstrated, there is, in fact, close relationship between the MPPT/MBPT and Coupled-Cluster methods when the Coupled-Cluster equations are solved iteratively starting with such an MPPT/MBPT-like initial 'guess' for these double-excitation amplitudes.

The Coupled-Cluster method, as presented here, suffers from the same drawbacks as the MPPT/MBPT approach; its energy is not an upper bound and it may not be able to accurately describe wavefunctions which have two or more CSFs with approximately equal amplitude. Moreover, solution of the non-linear Coupled-Cluster equations may be difficult and slowly (if at all) convergent. It has the same advantages as the MPPT/MBPT method; its energy is size-extensive, it requires no large matrix eigenvalue solution, and its energy and wavefunction are determined once one specifies the basis and the dominant CSF.

## Density Functional Methods

These approaches provide alternatives to the conventional tools of quantum chemistry. The CI, MCSCF, MPPT/MBPT, and CC methods move beyond the singleconfiguration picture by adding to the wave function more configurations whose amplitudes they each determine in their own way. This can lead to a very large number of CSFs in the correlated wave function, and, as a result, a need for extraordinary computer resources.

The density functional approaches are different. Here one solves a set of orbitallevel equations

\[ \left[ -\dfrac{\hbar^2}{2m_e}\nabla^2 -\sum\limits_A \dfrac{Z_Ae^2}{|\textbf{r} - \textbf{R}_A}| + \int\dfrac{\rho (\textbf{r}'e^2)}{|\textbf{r}-\textbf{r}'|}\textbf{dr}' + \text{U}(\textbf{r}) \right] \phi_i = \epsilon_i\phi_i \nonumber \]

in which the orbitals {\(\phi_i\)} 'feel' potentials due to the nuclear centers (having charges Z\(_{\text{A}}\)), Coulombic interaction with the **total** electron density \(\rho\)(**r**'), and a so-called **exchange correlation** potential denoted U(**r**'). The particular electronic state for which the calculation is being performed is specified by forming a corresponding density \(\rho\)(**r**'). Before going further in describing how DFT calculations are carried out, let us examine the origins underlying this theory.

The so-called Hohenberg-Kohn theorem states that the **ground-state** electron density \(\rho\)(**r**) describing an N-electron system uniquely determines the potential V(**r**) in the Hamiltonian

\[ \text{H } = \sum\limits_J \dfrac{\hbar^2}{2m_e}\nabla_j^2 + \text{V(}\textbf{r}_j\text{)} + \dfrac{e^2}{2} \sum\limits_{k \ne j} \dfrac{1}{r_{j,k}} , \nonumber \]

and, because H determines the ground-state energy and wave function of the system, the ground-state density \(\rho (\textbf{r})\) determines the ground-state properties of the system. The proof of this theorem proceeds as follows:

- \(\rho (\textbf{r}) \text{ determines N because } \int \rho (\textbf{r}) d^3r = \text{N} .\)
- Assume that there are two distinct potentials (aside from an additive constant that simply shifts the zero of total energy) V(
**r**) and V'(**r**) which, when used in H and H’, respectively, to solve for a ground state produce \[ \text{E}_0 \text{, } \psi \text{(}\textbf{r}\text{) and E}_0 ’ \text{, } \psi \text{’(}\textbf{r}\text{) } \nonumber \] that have the same one-electron density: \[ \int \big| \psi \big|^2 \text{dr}_2 \text{ dr}_3 \text{ ... dr}_{\text{N}} = \rho (\textbf{r}) = \int \big|\psi '\big| \text{dr}_2 \text{ dr}_3 \text{ ... dr}_{\text{N}} \nonumber \] - If we think of \(\psi\) as trial variational wave function for the Hamiltonian H, we know that \[ \text{E}_0 < \langle \psi ' \big| \text{H}\big| \psi '\rangle = \langle \psi '\big| \text{H'} \big| \psi ' \rangle + \int \rho(\textbf{r})\left[V(\textbf{r}) - V'(\textbf{r}) \right]\text{d}^3\text{r} = \text{E}_o ' + \int \rho (\textbf{r}) \left[ V(\text{r}) - V'(\textbf{r}) \right]\text{d}^3\textbf{r} \label{B3} \]
- Similarly, taking \(\psi\) as a trial function for the H' Hamiltonian, one finds that \[ E_0 ' \langle E_0 + \int \rho(\textbf{r}) \left[ V'(\textbf{r}) - V(\textbf{r}) \right] \text{d}^3\textbf{r}. \label{B4} \]
- Adding the Equations \ref{B3} and \ref{B4} gives \[ \text{E}_0 \text{ + } \text{E}_0 \text{ + } \text{E}_0 ', \nonumber \]and a clear contradiction.

Hence, there cannot be two distinct potentials V and V’ that give the same groundstate \(\rho (\textbf{r}\)). So, the ground-state density \(\rho (\textbf{r}\)) uniquely determines N and V, and thus H, and therefore \(\psi \text{ and E}_0. \text{ Furthermore, because } \psi\) determines all properties of the ground state, then \(\rho (\textbf{r})\), in principle, determines all such properties. This means that even the kinetic energy and the electron-electron interaction energy of the ground-state are determined by \(\rho (\textbf{r}\)). It is easy to see that \( \int \rho (\textbf{r}) \text{V(}\textbf{r}\text{)d}^3\textbf{r} = \text{V[}\rho ]\) gives the average value of the electron nuclear (plus any additional one-electron additive potential) interaction in terms of the ground-state density \(\rho (\textbf{r})\), but how are the kinetic energy T[\(\rho\)] and the electron-electron interaction V\(_{ee}\text{[}\rho\)] energy expressed in terms of \(\rho\)?

The main difficulty with DFT is that the **Hohenberg-Kohn theorem** shows that the **ground-state** values of T, V\(_{ee}\), V, etc. are all unique functionals of the **ground-state** \(\rho\) (i.e., that they can, in principle, be determined once \(\rho\) is given), but it does not tell us what these functional relations are.

To see how it might make sense that a property such as the kinetic energy, whose operator \( -\frac{\hbar^2}{2m_e}\nabla^2 \) involves derivatives, can be related to the electron density, consider a simple system of N non-interacting electrons moving in a three-dimensional cubic “box” potential. The energy states of such electrons are known to be

\[ E = \left( \dfrac{h^2}{2m_e L^2} \right) \left( n_x^2 + n_y^2 + n_z^2 \right), \nonumber \]

where L is the length of the box along the three axes, and \(n_x \text{, } n_y \text{, and } n_z\) are the quantum numbers describing the state. We can view \(n_x^2 \text{ + } n_y^2\text{ + } n_z^2 = R^2\) as defining the squared radius of a sphere in three dimensions, and we realize that the density of quantum states in this space is one state per unit volume in the \(n_x \text{, } n_y \text{, } n_z\) space. Because \(n_x \text{, } n_y \text{, and } n_z\) must be positive integers, the volume covering all states with energy less than or equal to a specified energy E = \(\left(\frac{h^2}{2m_eL^2}\right)R^2 \text{ is } \frac{1}{8}\) the volume of the sphere of radius R:

\[ \Phi (\text{E}) = \dfrac{1}{8}\left( \dfrac{4\pi}{3}\right)R^3 = \left( \dfrac{\pi}{6}\right) \sqrt{9m_e E\dfrac{L^2}{h^2}}^3 . \nonumber \]

Since there is one state per unit of such volume, \(\Phi\)(E) is also the number of states with energy less than or equal to E, and is called the **integrated density of states**. The number of states g(E) dE with energy between E and E+dE, the **density of states** , is the derivative of \(\Phi\):

\[ g(E) = \dfrac{d\Phi}{dE} = \dfrac{\pi}{4}\sqrt{8m_e\dfrac{L^2}{h^2}}^3 \sqrt{E}. \nonumber \]

If we calculate the total energy for N electrons, with the states having energies up to the so-called **Fermi energy** (i.e., the energy of the highest occupied molecular orbital HOMO) doubly occupied, we obtain the ground-state energy:

\[ E_0 = 2\int\limits_0^{E_F}g(E)EdE = \dfrac{8\pi}{5}\sqrt{\dfrac{2m_e}{h^2}}^3L^3\sqrt{E_F}^5 . \nonumber \]

The total number of electrons N can be expressed as

\[ N = 2\int\limits_0^{E_F}g(E)dE = \dfrac{8\pi}{3}\sqrt{\dfrac{2m_e}{h^2}}^3 L^3 \sqrt{E_F}^3 , \nonumber \]

which can be solved for \(E_F\) in terms of N to then express \(E_0\) in terms of N instead of \(E_F\):

\[ E_0 = \dfrac{3h^2}{10m_e}\sqrt[3]{\dfrac{3}{8\pi}}^2L^3 \sqrt[3]{\dfrac{N}{L}}^5 . \nonumber \]

This gives the total energy, which is also the kinetic energy in this case because the potential energy is zero within the “box”, in terms of the electron density \(\rho\)(x,y,z) = (\(\frac{N}{L^3}\) ). It therefore may be plausible to express kinetic energies in terms of electron densities \(\rho(\textbf{r}\)), but it is by no means clear how to do so for “real” atoms and molecules with electron-nuclear and electron-electron interactions operative.

In one of the earliest DFT models, the **Thomas-Fermi** theory, the kinetic energy of an atom or molecule is approximated using the above kind of treatment on a “local” level. That is, for each volume element in **r** space, one assumes the expression given above to be valid, and then one integrates over all **r** to compute the total kinetic energy:

\[ T_{TF}[\rho ] = \int \dfrac{3h^2}{10m_e}\sqrt[3]{\dfrac{3}{8\pi}}^2 \sqrt[3]{\rho(\textbf{r})}^5 \text{d}^3\textbf{r} = C_F\int \sqrt[3]{\rho (\textbf{r})}^5 \text{d}^3\textbf{r} , \nonumber \]

where the last equality simply defines the C\(_F\) constant (which is 2.8712 in atomic units). Ignoring the correlation and exchange contributions to the total energy, this T is combined with the electron-nuclear V and Coulombic electron-electron potential energies to give the Thomas-Fermi total energy:

\[ E_{0,TF}[\rho ] = C_F \int \sqrt[3]{\rho (\textbf{r})}^5d^3\textbf{r} + \int V(\textbf{r})\rho (\textbf{r}) d^3\textbf{r} + \dfrac{e^2}{2}\int \rho (\textbf{r})\dfrac{\rho (\textbf{r}')}{|\textbf{r}-\textbf{r}' |}d^3\textbf{r} d^3\textbf{r}' , \nonumber \]

This expression is an example of how E\(_0\) is given as a **local density functional approximation** (LDA). The term local means that the energy is given as a functional (i.e., a function of \(\rho\)) which depends only on \(\rho(\textbf{r})\) at points in space but not on \(\rho(\textbf{r})\) at more than one point in space.

Unfortunately, the Thomas-Fermi energy functional does not produce results that are of sufficiently high accuracy to be of great use in chemistry. What is missing in this theory are a. the exchange energy and b. the correlation energy; moreover, the kinetic energy is treated only in the approximate manner described.

In the book by Parr and Yang, it is shown how Dirac was able to address the exchange energy for the 'uniform electron gas' (N Coulomb **interacting** electrons moving in a uniform positive background charge whose magnitude balances the charge of the N electrons). If the exact expression for the exchange energy of the uniform electron gas is applied on a local level, one obtains the commonly used Dirac **local density approximation to the exchange energy**:

\[ \text{E}_{\text{ex, Dirac}}[\rho ] = - C_x \int \sqrt[3]{\rho (\textbf{r})}^4 d^3\textbf{r} , \nonumber \]

with \(C_x = \frac{3}{4}|sqrt[3]{\frac{3}{\pi}} = 0.7386\) in atomic units. Adding this exchange energy to the Thomas-Fermi total energy E\(_{0,TF} [\rho]\) gives the so-called Thomas-Fermi-Dirac (TFD) energy functional.

Because electron densities vary rather strongly spatially near the nuclei, corrections to the above approximations to T[\(\rho\text{] and E}_{ex.Dirac}\) are needed. One of the more commonly used so-called **gradient-corrected** approximations is that invented by Becke, and referred to as the Becke88 exchange functional:

\[ E_{ex}(Becke88) = E_{ex,Dirac}[\rho ] -\gamma \int x^2\sqrt[3]{\rho}^4\left[ \dfrac{1}{1 + 6\gamma\text{ x }csch(x)} \right] \textbf{dr,} \nonumber \]

where x = \(\frac{1}{\sqrt[3]{\rho}^4} \big|\nabla \rho \big| \text{, and } \gamma\) is a parameter chosen so that the above exchange energy can best reproduce the known exchange energies of specific electronic states of the inert gas atoms (Becke finds \(\gamma\) to equal 0.0042). A common gradient correction to the earlier T[\(\rho\)] is called the **Weizsacker correction** and is given by

\[ \delta\text{T}_{\text{Weizsacker}} = \dfrac{1}{72}\dfrac{\hbar}{m_e}\int \dfrac{\big| \nabla \rho (\textbf{r}) \big|^2}{\rho (\textbf{r})} d\textbf{r}. \nonumber \]

Although the above discussion suggests how one might compute the ground-state energy once the ground-state density \(\rho (\textbf{r})\) is given, one still needs to know how to obtain \(\rho\). Kohn and Sham (KS) introduced a set of so-called KS orbitals obeying the following equation:

\[ \left[ -\dfrac{1}{2}\nabla^2 + V(\textbf{r}) + \dfrac{e^2}{2}\int \dfrac{\rho(\textbf{r}') }{\big|\textbf{r}-\textbf{r}'\big|} \textbf{dr}' + U_{xc}(\textbf{r}) \right] \phi_j = \epsilon_j \phi_j , \nonumber \]

where the so-called exchange-correlation potential \(U_{xc} (\textbf{r}) = \delta\text{E}_{xc}\dfrac{[\rho ]}{\delta\rho (\textbf{r})}\) could be obtained by functional differentiation if the exchange-correlation energy functional \(\text{E}_{xc}[\rho ]\) were known. KS also showed that the KS orbitals {\(\phi_j\)} could be used to compute the density \(\rho\) by simply adding up the orbital densities multiplied by orbital occupancies n\(_j\) :

\[ \rho (\textbf{r} ) = \sum\limits_j n_j \big| \phi_j (\textbf{r})\big|^2 . \nonumber \]

(here \(n_j\) =0,1, or 2 is the occupation number of the orbital \(\phi_j\) in the state being studied) and that the kinetic energy should be calculated as

\[ \text{T} = \sum\limits_j n_j \langle \phi_j (\textbf{r}) \big| -\dfrac{1}{2} \nabla^2 \big| \phi_j (\textbf{r}) \rangle . \nonumber \]

The same investigations of the idealized 'uniform electron gas' that identified the Dirac exchange functional, found that the correlation energy (per electron) could also be written exactly as a **function** of the electron density \(\rho\) of the system, but only in two limiting cases- the high-density limit (large \(\rho\)) and the low-density limit. There still exists no exact expression for the correlation energy even for the uniform electron gas that is valid at arbitrary values of \(\rho\). Therefore, much work has been devoted to creating efficient and accurate interpolation formulas connecting the low- and high- density uniform electron gas expressions. One such expression is

\[ \text{E}_c [\rho ] = \int \rho (\textbf{r})\epsilon_c(\rho )\text{d}\textbf{r} , \nonumber \]

where

\[ \epsilon_c(\rho ) = \dfrac{A}{2}\left[ ln\left( \dfrac{x}{X} \right) + \dfrac{2b}{Q tan^{-1}\left( \dfrac{Q}{2x + b} \right)} -\dfrac{bx_0}{X_0} \left[ ln\left( \dfrac{(x-x_0)^2}{X} + \dfrac{2(b+x_0)}{Q tan^{-1}\left( \dfrac{Q}{2x+b} \right)} \right) \right] \right] \nonumber \]

is the correlation energy per electron. Here x = \(\sqrt{r_s} \text{, } X=x^2 +bx+c \text{, } X_0 = x_0^2 + bx_0 + c \text{ and } Q = \sqrt{(4c - b^2)} \text{, } A = 0.0621814\text{, } x_0\) = -0.409286, b = 13.0720, and c = 42.7198. The parameter \(r_s \text{ is how the density } \rho \text{ enters since } \frac{4}{3} \pi r_s^3\) is equal to \(\frac{1}{\rho}; \text{ that is, } r_s\) is the radius of a sphere whose volume is the effective volume occupied by one electron. A reasonable approximation to the full \(\text{E}_{xc}[\rho ]\) would contain the Dirac (and perhaps gradient corrected) exchange functional plus the above \(\text{E}_C[\rho ]\), but there are many alternative approximations to the exchange-correlation energy functional. Currently, many workers are doing their best to “cook up” functionals for the correlation and exchange energies, but no one has yet invented functionals that are so reliable that most workers agree to use them.

To summarize, in implementing any DFT, one usually proceeds as follows:

- An atomic orbital basis is chosen in terms of which the KS orbitals are to be expanded.
- Some initial guess is made for the LCAO-KS expansion coefficients \(\text{C}_{jj,a}: \phi_j = \sum\limits_a \text{C}_{j,a}\chi_a\).
- The density is computed as \(\rho (\textbf{r}) = \sum\limits_j n_j \big|\phi_j (\textbf{r})\big|^2 \). Often, \(\rho (\textbf{r})\) is expanded in an atomic orbital basis, which need not be the same as the basis used for the \(\phi_j\), and the expansion coefficients of \(\rho\) are computed in terms of those of the \(\phi_j\). It is also common to use an atomic orbital basis to expand \(\sqrt[3]{\rho (\textbf{r})} \text{ which, together with } \rho\), is needed to evaluate the exchange-correlation functional’s contribution to E\(_0\).
- The current iteration’s density is used in the KS equations to determine the Hamiltonian \(\left[-\frac{1}{2}\nabla^2 \text{ + V}(\textbf{r}) + \frac{e^2}{2} \int \frac{\rho (\textbf{r}’)}{|\textbf{r}-\textbf{r}’|} \text{ d}\textbf{r}’ + \text{ U}_{xc}(\textbf{r}) \right]\)whose “new” eigenfunctions {\(\phi_j\) } and eigenvalues {\(\epsilon_j\) } are found by solving the KS equations.
- These new \(\phi_j\) are used to compute a new density, which, in turn, is used to solve a new set of KS equations. This process is continued until convergence is reached (i.e., until the \(\phi_j\) used to determine the current iteration’s \(\rho\) are the same \(\phi_j\) that arise as solutions on the next iteration.
- Once the converged \(\rho (\textbf{r})\) is determined, the energy can be computed using the earlier expression

\[ \text{E} [\rho ] = \sum\limits_j n_j \langle \phi_j (\textbf{r})\big| -\dfrac{1}{2} \nabla^2 \big| \phi_j (\textbf{r})\rangle + \int V(\textbf{r}) \rho (\textbf{r}) \text{d}\textbf{r} + \dfrac{e^2}{2} \int \rho (\textbf{r})\dfrac{ \rho (\textbf{r}')}{\big|\textbf{r}-\textbf{r}'\big|}\text{d}\textbf{r }\text{d}\textbf{r}' + \text{E}_{xc}[\rho ]. \nonumber \]

In closing this section, it should once again be emphasized that this area is currently undergoing explosive growth and much scrutiny. As a result, it is nearly certain that many of the specific functionals discussed above will be replaced in the near future by improved and more rigorously justified versions. It is also likely that extensions of DFT to excited states (many workers are actively pursuing this) will be placed on more solid ground and made applicable to molecular systems. Because the computational effort involved in these approaches scales much less strongly with basis set size than for conventional (SCF, MCSCF, CI, etc.) methods, density functional methods offer great promise and are likely to contribute much to quantum chemistry in the next decade.