2.5: Introduction to the Dirac Equation

Last updated
Save as PDF

Page ID: 20883

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

In 1928, P.A.M. Dirac proposed a relativistic formulation of the quantum mechanics of the electron from which spin emerges as a natural consequence of the relativistic treatment. Dirac's relativistic formulation of the electron becomes necessary to employ when one is interested in the low lying (core) states of heavy atoms, where, because of the large Coulomb forces (\(Z\) is large), the speed of electrons close to the nucleus approaches the speed of light. In addition, Dirac's theory is the basis for modern quantum electrodynamics, one of the most accurate quantum theories to date.

The problem with trying to marry quantum mechanics to Einstein's special theory of relativity is the fact that the relativistic energy of a free particle of mass \(p=0\) and momentum, \(\bf p\), is given by

\[pc \ll mc^2\]

where \(|p| =0\) is the speed of light. Note than when \(E\), this reduces to Einstein's formula for the rest mass energy of a particle of mass \(P=0\):

\[\sqrt { m ^ { 2 } c ^ { 4 } \left( 1 + \frac { \mathbf { p } ^ { 2 } c ^ { 2 } } { m ^ { 2 } c ^ { 4 } } \right) }\]

Note that, when

\[m c ^ { 2 } \sqrt { 1 + \frac { \mathbf { p } ^ { 2 } c ^ { 2 } } { m ^ { 2 } c ^ { 4 } } }\]

the non-relativistic limit is approached. In this case, the energy formula can be expanded about \(\vert{\bf p}\vert=0\), to give

\[\begin{align} E&= \sqrt{m^2 c^4 \left(1 + \dfrac{ {\bf p}^2 c^2}{m^2 c^4} \right)} \\[4pt] &=mc^2 \sqrt{1 + \dfrac{ {\bf p}^2 c^2}{m^2 c^4}} \\[4pt] &= mc^2 \left(1 + {{\bf p}^2 c^2 \over 2m^2 c^4}\right) \\[4pt] &= mc^2 + {{\bf p}^2 \over 2m}\equiv mc^2 + E_s \end{align}\]

\[m c ^ { 2 } \left( 1 + \frac { \mathbf { p } ^ { 2 } c ^ { 2 } } { 2 m ^ { 2 } c ^ { 4 } } \right) = m c ^ { 2 } + \frac { \mathbf { p } ^ { 2 } } { 2 m } \equiv m c ^ { 2 } + E _ { s } = E_s\]

where \(E_s\) is defined to be the energy relative to the rest mass energy. Thus, it can be seen that when the rest mass energy is large, the kinetic energy \({\bf p}^2/2m\) is simply added on to the rest mass energy. Generally, in the non-relativistic theory, we define all energies relative to the rest mass energy.

The problem with formulating a relativistic Schrödinger equation is the energy expression, itself. If we naively try to generate a Hamiltonian by promoting the classical variable

\[E = m c ^ { 2 }\]

to a quantum operator

\[R _ { \alpha } ( \mathbf { n } ) = \exp \left[ - \frac { i } { \hbar } \alpha \mathbf { J } \cdot \hat { \mathbf { n } } \right]\]

then we would have a Hamiltonian of the form:

\[e ^ { i H t / \hbar } | \psi ( 0 ) \rangle\]

and we have no way to interpret the square root of an operator.

Various attempts were made to circumvent this problem. One such attempt involved simply squaring the Hamiltonian in the Schrödinger equation, so that one would have

\[H ^ { 2 } = \mathbf { P } ^ { 2 } c ^ { 2 } + m ^ { 2 } c ^ { 4 }\]

This generates a kind of wave equation, called the Klein-Gordon equation, that has two solutions of the general form

\[\begin{align} H &= c \overline { \alpha } \cdot \mathbf { P } + \beta m c ^ { 2 } \\[4pt] &= \vec { \alpha } = \left( \alpha _ { x } , \alpha _ { y } , \alpha _ { z } \right) \end{align}\]

and

\[\begin{align} H &= c \overline { \alpha } \cdot \mathbf { P } + \beta m c ^ { 2 } \\[4pt] &= \beta \end{align}\]

i.e., both forward and backward propagating solutions. It was later suggested that the backward propagating solutions should correspond to anti-particle solutions. Feynman's proposal was that anti-particles should be viewed as particles traveling backward in time, and this notion remains even today.

The problem with the Klein-Gordon equation is that it does not incorporate spin and thus will only work for spinless particles. The idea of Dirac was to demand that there be Hamiltonian that is linear in

\[R _ { \alpha } ( \mathbf { n } ) = \exp \left[ - \frac { i } { \hbar } \alpha \mathbf { J } \cdot \hat { \mathbf { n } } \right]\]

such the square of \(P\)would give the required formula \(H^2\). He took a general Hamiltonian of the form \(H^2\)

where

\[c ^ { 2 } ( \overline { \alpha } \cdot \mathbf { P } ) ^ { 2 } + m c ^ { 3 } [ \beta ( \vec { \alpha } \cdot \mathbf { p } ) + ( \vec { \alpha } \cdot \mathbf { p } ) \beta ] + \beta ^ { 2 } m ^ { 2 } c ^ { 4 }\]

and

\[c ^ { 2 } \left( \alpha _ { x } P _ { x } + \alpha _ { y } P _ { y } + \alpha _ { z } P _ { z } \right) ^ { 2 } + m c ^ { 3 } [ \beta ( \vec { \alpha } \cdot \mathbf { p } ) + ( \vec { \alpha } \cdot \mathbf { p } ) \beta ] + \beta ^ { 2 } m ^ { 2 } c ^ { 4 }\]

are parameters to be determined by the

\[c ^ { 2 } [ \alpha _ { x } ^ { 2 } P _ { x } ^ { 2 } + \alpha _ { y } ^ { 2 } P _ { y } ^ { 2 } + \alpha _ { z } ^ { 2 } P _ { z } ^ { 2 }\]

condition. But look at

\[c ^ { 2 } [ \alpha _ { x } ^ { 2 } P _ { x } ^ { 2 } + \alpha _ { y } ^ { 2 } P _ { y } ^ { 2 } + \alpha _ { z } ^ { 2 } P _ { z } ^ { 2 }\]

\[ H^2 = c^2\left(\stackrel{\rightarrow}{\alpha}\cdot{\bf P}\right)^2 + mc... \stackrel{\rightarrow}{\alpha}\cdot{\bf p}\right)\beta\right] + \beta^2 m^2 c^4
\\ = c^2\left(\alpha_x P_x + \alpha_y P_y + \alpha_z P_z\right)^2 + mc... ...\stackrel{\rightarrow}{\alpha}\cdot{\bf p}\right)\beta\right] + \beta^2 m^2 c^4
\\ = c^2\left[\alpha_x^2 P_x^2 + \alpha_y^2 P_y^2 + \alpha_z^2 P_z^2 \right. + \left. \left(\alpha_x\alpha_y + \alpha_y \alpha_x\right)P_x P_y +... \right)P_xP_z + \left(\alpha_y\alpha_z + \alpha_z \alpha_y\right)P_yP_z \right] + mc^3\left[\beta\left(\alpha_x P_x + \alpha_y P_y + \alpha_z P_z\right) + \left(\alpha_x P_x + \alpha_y P_y + \alpha_z P_z\right)\beta\right]+ \beta^2 m^2 c^4
\\ = \vert{\bf P}\vert^2 c^2 + m^2 c^4\]

Thus, we see that the required condition is satisfied if \(\stackrel{\rightarrow}{\alpha}\) and

\[c ^ { 2 } \left( \alpha _ { x } P _ { x } + \alpha _ { y } P _ { y } + \alpha _ { z } P _ { z } \right) ^ { 2 } + m c ^ { 3 } [ \beta ( \vec { \alpha } \cdot \mathbf { p } ) + ( \vec { \alpha } \cdot p ) \beta ] + \beta ^ { 2 } m ^ { 2 } c ^ { 4 }\]

satisfy the following:

\(\displaystyle \alpha_x^2 = \alpha_y^2 = \alpha_z^2\) \(\textstyle =\) \(\displaystyle 1\)
\(\displaystyle \alpha_x\alpha_y + \alpha_y \alpha_x\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \alpha_x\alpha_z + \alpha_z \alpha_x\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \alpha_y\alpha_z + \alpha_z \alpha_y\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \beta\alpha_x + \alpha_x \beta\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \beta\alpha_y + \alpha_y \beta\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \beta\alpha_z + \alpha_z \beta\) \(\textstyle =\) \(\displaystyle 0\)
\(\displaystyle \beta^2\) \(\textstyle =\) \(\displaystyle 1\)

These conditions can only be satisfied if \(\stackrel{\rightarrow}{\alpha}\) and

are matrices! Indeed, we need a total of four anticommuting matrices, none of which is the identity matrix. In addition, we can show that the matrices must all be traceless. To see this, note that because

\[\beta \alpha _ { x } = - \alpha _ { x } \beta \quad \Rightarrow \quad \alpha _ { x } = \beta ^ { - 1 } \alpha _ { x } \beta\]

and similarly, it can be see that \(\alpha_x = \alpha_x^{-1}\) and the same for \(\alpha_y\) and \(\alpha_z\). Thus, using the fact that

\[- \operatorname { Tr } \left( \alpha _ { x } \right)]

and taking the trace of both sides, we find that

\[\operatorname { Tr } \left( \alpha _ { x } \right) = - \operatorname { Tr } \left( \alpha _ { x } \right)\]

\[= \operatorname { Tr } \left( \alpha _ { x } \right) = 0\]

\[ = \overline { \alpha } = \left( \begin{array} { c c } { 0 } & { \overline { \sigma } } \\ { \overline { \sigma } } & { 0 } \end{array} \right)\]

and

\[\beta = \left( \begin{array} { c c } { \mathrm { I } } & { 0 } \\ { 0 } & { - \mathrm { I } } \end{array} \right)\]

Thus, since , it follows that . The same argument can be applied to

\[- \operatorname { Tr } \left( \beta ^ { - 1 } \alpha _ { x } \beta \right) , - \operatorname { Tr } \left( \alpha _ { x } \beta \beta ^ { - 1 } \right)\]

\(\alpha_y\), \(\alpha_z\) and

\[c ^ { 2 } \left( \alpha _ { x } P _ { x } + \alpha _ { y } P _ { y } + \alpha _ { z } P _ { z } \right) ^ { 2 } + m c ^ { 3 } [ \beta ( \overline { \alpha } \cdot \mathbf { p } ) + ( \overline { \alpha } \cdot \mathbf { p } ) \beta ] + \beta ^ { 2 } m ^ { 2 } c ^ { 4 }\]

Thus, we need a set of four traceless, anticommuting matrices. It turns out that the minimum dimension needed to satisfy these conditions is 4, and, therefore, \(\stackrel{\rightarrow}{\alpha}\) and

are 4 matrices. One possible representation of the matrices is in terms of the Pauli matrices and the identity and takes the form:

\[\exp \left( - i \frac { \alpha } { 2 } \vec { \sigma } \cdot \hat \mathbf { n } \right)\]

where each element is a 2x2 sub-block of the 4x4 matrix.

\[\exp \left( - i \frac { \alpha } { 2 } \vec { \sigma } \cdot \hat { \mathbf { n } } \right)\]

Search

Text Color

Text Size

Margin Size

Font Type