4.4: Entropy and Information

Last updated
Save as PDF

Page ID: 285769

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Gibbs Entropy

For a system with a countable number of microstates an ensemble entropy can be defined by a weighted sum over entropies of all microstates that are in turn expressed as \(-k_\mathrm{B} \ln P_i\), which is analogous to Boltzmann’s entropy definition for a macrostate.

\[S = -k_\mathrm{B} \sum_i P_i \ln P_i \ .\]

This is the definition of Gibbs entropy, while Boltzmann entropy is assigned to an individual microstate. Note that we have used a capital \(S\) because Gibbs entropy is a molecular entropy. Using Equation \ref{eq:Boltzmann_distribution}, we obtain for the system entropy \(s = N S\),

\[\begin{align} s & = -k_\mathrm{B} N \sum_i P_i \left( -\frac{\epsilon_i}{k_\mathrm{B} T} - \ln Z \right) \\ & = \frac{u}{T} + k_B \ln z \ , \label{eq:Gibbs_system_entropy}\end{align}\]

where we have assumed distinguishable particles, so that \(\ln z = N \ln Z\). We have recovered Equation \ref{eq:s_from_z} that we had derived for the system entropy starting from Boltzmann entropy and assuming a canonical ensemble. For a canonical ensemble of distinguishable particles, either concept can be used. As noted above, Gibbs entropy leads to the paradox of a positive mixing entropy for combination of two subsystems made up by the same ideal gas. More generally, Gibbs entropy is not extensive if the particles are indistinguishable. The problem can be solved by redefining the system partition function as in Equation \ref{eq:z_indist}.

This problem suggests that entropy is related to the information we have on the system. Consider mixing of \(\ce{^{13}CO2}\) with \(^{12}\mathrm{CO}_2\).¹⁵ At a time when nuclear isotopes were unknown, the two gases could not be distinguished and mixing entropy was zero. With a sufficiently sensitive spectrometer we could nowadays observe the mixing process by \(^{13}\mathrm{C}\) NMR. We will observe spontaneous mixing. Quite obviously, the mixing entropy is not zero anymore.

This paradox cautions against philosophical interpretation of entropy. Entropy is a quantity that can be used for predicting the outcome of physical experiments. It presumes an observer and depends on the information that the observer has or can obtain.¹⁶ Statistical mechanics provides general recipes for defining entropy, but the details of a proper definition depend on experimental context.

Unlike the system entropy derived from Boltzmann entropy via the canonical ensemble, Gibbs entropy is, in principle, defined for non-equilibrium states. Because it is based on the same probability concept, Gibbs entropy in an isolated system is smaller for non-equilibrium states than for equilibrium states.

Von Neumann Entropy

The concept of Gibbs entropy for a countable set of discrete states and their probabilities is easily extended to continuous phase space and probability densities. This leads to the von Neumann entropy,

\[S = -k_\mathrm{B} \mathrm{Trace}\left\{ \rho \ln \rho \right\} \ , \label{eq:von_Neumann_entropy}\]

where \(\rho\) is the density matrix. Some physics textbooks don’t distinguish von Neumann entropy from Gibbs entropy . Von Neumann entropy is a constant of motion if an ensemble of classical systems evolves according to the Liouville equation or a quantum mechanical system evolves according to the Liouville-von-Neumann equation. It cannot describe the approach of an isolated system to equilibrium. Coupling of the quantum mechanical system to an environment can be described by the stochastic Liouville equation

\[\frac{\partial \widehat{\rho}}{\partial t} = -\frac{i}{\hbar} \left[ \mathcal{\widehat{H}}, \widehat{\rho} \right] + \widehat{\widehat{\Gamma}} \left( \widehat{\rho} - \widehat{\rho}_\mathrm{eq} \right) \ ,\]

where \(\widehat{\widehat{\Gamma}}\) is a Markovian operator and \(\rho_\mathrm{eq}\) the density matrix at equilibrium. This equation of motion can describe quantum dissipative systems, i.e., the approach to equilibrium, without relying explicitly on the concept of entropy, except for the computation of \(\rho_\mathrm{eq}\), which relies on generalization of the Boltzmann distribution (see Section [subsection:q_partition]). However, to derive the Markovian operator \(\widehat{\widehat{\Gamma}}\), explicit assumptions on the coupling between the quantum mechanical system and its environment must be made, which is beyond the scope of this lecture course.

Shannon Entropy

The concept of entropy has also been introduced into information theory. For any discrete random number that can take values \(a_j\) with probabilities \(P(a_j)\), the Shannon entropy is defined as

\[H_\mathrm{Shannon}\left( a \right) = -\sum_j P(a_j) \log_2 P(a_j) \ .\]

A logarithm to the basis of 2 is used here as the information is assumed to be coded by binary numbers. Unlike for discrete states in statistical mechanics, an event may be in the set but still have a probability \(P(a_j) = 0\). In such cases, \(P(a_j) \log_2 P(a_j)\) is set to zero. Shannon entropy is the larger the ’more random’ the distribution is, or, more precisely, the closer the distribution is to a uniform distribution. Information is considered as deviation from a random stream of numbers or characters. The higher the information content is, the lower the entropy.

Shannon entropy can be related to reduced Gibbs entropy \(\sigma = S/k_\mathrm{B}\). It is the amount of Shannon information that is required to specify the microstate of the system if the macrostate is known. When expressed with the binary logarithm, this amount of Shannon information specifies the number of yes/no questions that would have to be answered to specify the microstate. We note that this is exactly the type of experiment presumed in the second Penrose postulate (Section [Penrose_postulates]). The more microstates are consistent with the observed macrostate, the larger is this number of questions and the larger are Shannon and Gibbs entropy. The concept applies to non-equilibrium states as well as to equilibrium states. It follows, what was stated before Shannon by G. N. Lewis: "Gain in entropy always means loss of information, and nothing more". The equilibrium state is the macrostate that lacks most information on the underlying microstate.

We can further associate order with information, as any ordered arrangement of objects contains information on how they are ordered. In that sense, loss of order is loss of information and increase of disorder is an increase in entropy. The link arises via probability, as the total number of arrangements is much larger than the number of arrangements that conform to a certain order principle. Nevertheless, the association of entropy with disorder is only colloquial, because in most cases we do not have quantitative descriptions of order.