<h2>CHAPTER 1</h2><p><b>Information and Coding Theory</b></p><br><p>In his classic paper 'A Mathematical Theory of Communication', Claude Shannon introducedthe main concepts and theorems of what is known as information theory. Definitionsand models for two important elements are presented in this theory. These elements are thebinary source (BS) and the binary symmetric channel (BSC). A binary source is a device thatgenerates one of the two possible symbols '0' and '1' at a given rate <i>r</i>, measured in symbolsper second. These symbols are called <i>bits</i> (binary digits) and are generated randomly.</p><p>The BSC is a medium through which it is possible to transmit one symbol per time unit.However, this channel is not reliable, and is characterized by the error probability <i>p</i> (0 ≤ <i>p</i> ≤1/2) that an output bit can be different from the corresponding input. The symmetry of thischannel comes from the fact that the error probability <i>p</i> is the same for both of the symbolsinvolved.</p><p>Information theory attempts to analyse communication between a transmitter and a receiverthrough an unreliable channel, and in this approach performs, on the one hand, an analysis ofinformation sources, especially the amount of information produced by a given source, and, onthe other hand, states the conditions for performing reliable transmission through an unreliablechannel.</p><p>There are three main concepts in this theory:</p><p>1. The first one is the definition of a quantity that can be a valid measurement of information,which should be consistent with a physical understanding of its properties.</p><p>2. The second concept deals with the relationship between the information and the source thatgenerates it. This concept will be referred to as source information. Well-known informationtheory techniques like compression and encryption are related to this concept.</p><p>3. The third concept deals with the relationship between the information and the unreliablechannel through which it is going to be transmitted. This concept leads to the definition ofa very important parameter called the channel capacity. A well-known information theorytechnique called error-correction coding is closely related to this concept. This type ofcoding forms the main subject of this book.</p><br><p>One of the most used techniques in information theory is a procedure called coding, which isintended to optimize transmission and to make efficient use of the capacity of a given channel.In general terms, coding is a bijective assignment between a set of messages to be transmitted,and a set of codewords that are used for transmitting these messages. Usually this procedureadopts the form of a table in which each message of the transmission is in correspondencewith the codeword that represents it (see an example in Table 1.1).</p><p>Table 1.1 shows four codewords used for representing four different messages. As seen inthis simple example, the length of the codeword is not constant. One important property of acoding table is that it is constructed in such a way that every codeword is uniquely decodable.This means that in the transmission of a sequence composed of these codewords there shouldbe only one possible way of interpreting that sequence. This is necessary when variable-lengthcoding is used.</p><p>If the code shown in Table 1.1 is compared with a constant-length code for the same case,constituted from four codewords of two bits, 00, 01, 10, 11, it is seen that the code in Table 1.1adds redundancy. Assuming equally likely messages, the average number of transmitted bitsper symbol is equal to 2.75. However, if for instance symbol <i>s</i><sub>2</sub> were characterized by aprobability of being transmitted of 0.76, and all other symbols in this code were characterizedby a probability of being transmitted equal to 0.08, then this source would transmit an averagenumber of bits per symbol of 2.24 bits. As seen in this simple example, a level of compression ispossible when the information source is not uniform, that is, when a source generates messagesthat are not equally likely.</p><p>The source information measure, the channel capacity measure and coding are all relatedby one of the Shannon theorems, the channel coding theorem, which is stated as follows:</p><p><i>If the information rate of a given source does not exceed the capacity of a given channel,then there exists a coding technique that makes possible transmission through this unreliablechannel with an arbitrarily low error rate.</i></p><br><p>This important theorem predicts the possibility of error-free transmisssion through a noisy orunreliable channel. This is obtained by using coding. The above theorem is due to ClaudeShannon, and states the restrictions on the transmission of information through a noisychannel, stating also that the solution for overcoming those restrictions is the application ofa rrrrather sophisticated coding technique. What is not formally stated is how to implement thiscoding technique.</p><p>A block diagram of a communication system as related to information theory is shown inFigure 1.1.</p><p>The block diagram seen in Figure 1.1 shows two types of encoders. The channel encoderis designed to perform error correction with the aim of converting an unreliable channel intoa reliable one. On the other hand, there also exists a source encoder that is designed to makethe source information rate approach the channel capacity. The destination is also called theinformation sink.</p><p>Some concepts relating to the transmission of discrete information are introduced in thefollowing sections.</p><br><p><b>1.1 Information</b></p><p><i>1.1.1 A Measure of Information</i></p><p>From the point of view of information theory, information is not knowledge, as commonlyunderstood, but instead relates to the probabilities of the symbols used to send messagesbetween a source and a destination over an unreliable channel. A quantitative measure ofsymbol information is related to its probability of occurrence, either as it emerges from asource or when it arrives at its destination. The less likely the event of a symbol occurrence,the higher is the information provided by this event. This suggests that a quantitative measureof symbol information will be inversely proportional to the probability of occurrence.</p><p>Assuming an arbitrary message <i>x<sub>i</sub></i> which is one of the possible messages from a set a givendiscrete source can emit, and <i>P(x<sub>i</sub>) = P<sub>i</sub></i> is the probability that this message is emitted, theoutput of this information source can be modelled as a random variable <i>X</i> that can adopt any ofthe possible values <i>x<sub>i</sub></i>, so that <i>P(X = x<sub>i</sub>) = P<sub>i</sub></i>. Shannon defined a measure of the informationfor the event <i>x<sub>i</sub></i> by using a logarithmic measure operating over the base <i>b</i>:</p><p><i>I<sub>i</sub></i> [equivalent to] -log<i><sub>b</sub> P<sub>i</sub></i> = log<sub><i>b</i></sub> (1/<i>P<sub>i</sub></i>) (1)</p><br><p>The information of the event depends only on its probability of occurrence, and is notdependent on its content.</p><p>The base of the logarithmic measure can be converted by using</p><p>log<i><sub>a</sub>(x)</i> = log<i><sub>b</sub>(x)</i> 1/log<i><sub>b</sub>(a)</i> (2)</p><br><p>If this measure is calculated to base 2, the information is said to be measured in <i>bits</i>. If themeasure is calculated using natural logarithms, the information is said to be measured in <i>nats</i>.As an example, if the event is characterized by a probability of <i>P<sub>i</sub></i> = 1/2, the correspondinginformation is <i>I<sub>i</sub></i> = 1 bit. From this point of view, a bit is the amount of information obtainedfrom one of two possible, and equally likely, events. This use of the term <i>bit</i> is essentiallydifferent from what has been described as the binary digit. In this sense the bit acts as the unitof the measure of information.</p><p>Some properties of information are derived from its definition:</p><p><i>I<sub>i</sub></i> ≥ 0 0 ≤ <i>P<sub>i</sub></i> ≤ 1</p><p><i>I<sub>i</sub></i> -> 0 if <i>P<sub>i</sub></i> -> 1</p><p><i>I<sub>i</sub> > I<sub>j</sub></i> if <i>P<sub>i</sub> < P<sub>j</sub></i></p><br><p>For any two independent source messages <i>x<sub>i</sub></i> and <i>x<sub>j</sub></i> with probabilities <i>P<sub>i</sub></i> and <i>P<sub>j</sub></i> respectively,and with joint probability <i>P(x<sub>i</sub> , x<sub>j</sub>) = P<sub>i</sub> P<sub>j</sub></i>, the information of the two messages is the additionof the information in each message:</p><p><i>I<sub>ij</sub></i> = log<sub><i>b</i></sub> 1/<i>P<sub>i</sub> P<sub>j</sub></i> = log<sub><i>b</i></sub>1/<i>P<sub>i</sub></i> + log<sub><i>b</i></sub> 1/<i>P<sub>j</sub> = I<sub>i</sub> + I<sub>j</sub></i></p><br><p><b>1.2 Entropy and Information Rate</b></p><p>In general, an information source generates any of a set of <i>M</i> different symbols, which areconsidered as representatives of a discrete random variable <i>X</i> that adopts any value in the range<i>A = {x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>M</sub>}</i>. Each symbol <i>x<sub>i</sub></i> has the probability <i>P<sub>i</sub></i> of being emitted and containsinformation <i>I<sub>i</sub></i>. The symbol probabilities must be in agreement with the fact that at least oneof them will be emitted, so</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)</p><p>The source symbol probability distribution is stationary, and the symbols are independentand transmitted at a rate of <i>r</i> symbols per second. This description corresponds to a discretememoryless source (DMS), as shown in Figure 1.2.</p><p>Each symbol contains the information <i>I<sub>i</sub></i> so that the set<i>{I<sub>1</sub>, I<sub>2</sub>, ..., I<sub>M</sub>}</i> can be seen as adiscrete random variable with average information</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)</p><p>The function so defined is called the entropy of the source. When base 2 is used, the entropyis measured in bits per symbol:</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)</p><p>The symbol information value when <i>P<sub>i</sub></i> = 0 is mathematically undefined. To solve thissituation, the following condition is imposed: <i>I<sub>i</sub></i> = ∞ if <i>P<sub>i</sub></i> = 0. Therefore<i>P<sub>i</sub></i> log<sub>2</sub> (1/<i>P<sub>i</sub></i>) = 0(L'Hopital's rule) if <i>P<sub>i</sub></i> = 0. On the other hand, <i>P<sub>i</sub></i> log (1/<i>P<sub>i</sub></i>) = 0 if <i>P<sub>i</sub></i> = 1.</p><br><p><i>Example 1.1:</i> Suppose that a DMS is defined over the range of <i>X, A = {x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>, x<sub>4</sub>}</i>, andthe corresponding probability values for each symbol are <i>P(X = x<sub>1</sub>) = 1/2, P(X = x<sub>2</sub>) =P(X = x<sub>3</sub>) = 1/8</i> and <i>P(X = x<sub>4</sub>)</i> = 1/4.</p><p>Entropy for this DMS is evaluated as</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]</p><p><i>Example 1.2:</i> A source characterized in the frequency domain with a bandwidth of <i>W</i> =4000 Hz is sampled at the Nyquist rate, generating a sequence of values taken from the range<i>A</i> = {-2, -1, 0, 1, 2} with the following corresponding set of probabilities {1/2, 1/4, 1/8, 1/16}.Calculate the source rate in bits per second.</p><p>Entropy is first evaluated as</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]</p><p>The minimum sampling frequency is equal to 8000 samples per second, so that the informationrate is equal to 15 kbps.</p><p>Entropy can be evaluated to a different base by using</p><p><i>H<sub>b</sub>(X) = H(X)/log<sub>2</sub>(b)</i> (6)</p><br><p>Entropy <i>H(X)</i> can be understood as the mean value of the information per symbol providedby the source being measured, or, equivalently, as the mean value experienced by an observerbefore knowing the source output. In another sense, entropy is a measure of the randomnessof the source being analysed. The entropy function provides an adequate quantitative measureof the parameters of a given source and is in agreement with physical understanding of theinformation emitted by a source.</p><p>Another interpretation of the entropy function is seen by assuming that if <i>n</i> >> 1 symbolsare emitted, <i>nH(X)</i> bits is the total amount of information emitted. As the source generates<i>r</i> symbols per second, the whole emitted sequence takes <i>n/r</i> seconds. Thus, information willbe transmitted at a rate of</p><p><i>nH(X)/(n/r)</i> bps (7)</p><p>The information rate is then equal to</p><p><i>R = rH(X)</i> bps (8)</p><p>The Shannon theorem states that information provided by a given DMS can be coded usingbinary digits and transmitted over an equivalent noise-free channel at a rate of</p><p><i>r<sub>b</sub> ≥ R</i> symbols or binary digits per second</p><p>It is again noted here that the bit is the unit of information, whereas the symbol or binarydigit is one of the two possible symbols or signals '0' or '1', usually also called bits.</p><p>Theorem 1.1: Let <i>X</i> be a random variable that adopts values in the range <i>A</i> = {<i>x</i><sub>1</sub>,<i>x</i><sub>2</sub>, ..., <i>x<sub>M</sub></i>} and represents the output of a given source. Then it is possible to show that</p><p>0 ≤ <i>H(X)</i> ≤ log<sub>2</sub>(<i>M</i>) (9)</p><p>Additionally,</p><p><i>H(X)</i> = 0 if and only if <i>P<sub>i</sub></i> = 1 for some <i>i</p><p>H(X)</i> = log<sub>2</sub><i>(M)</i> if and only if <i>P<sub>i</sub></i> = 1/<i>M</i> for every <i>i</i> (10)</p><p>The condition 0 ≤ <i>H(X)</i> can be verified by applying the following:</p><p><i>P<sub>i</sub></i> log<sub>2</sub>(1/<i>P<sub>i</sub></i>) -> 0 if <i>P<sub>i</sub></i> -> 0</p><br><p>The condition <i>H(X)</i> ≤ log<sub>2</sub>(<i>M</i>) can be verified in the following manner:</p><p>Let <i>Q</i><sub>1</sub>, <i>Q</i><sub>2</sub>, ..., <i>Q<sub>M</sub></i> be arbitrary probability values that are used to replace terms 1/<i>P<sub>i</sub></i> bythe terms <i>Q<sub>i</sub>/P<sub>i</sub></i> in the expression of the entropy [equation (5)]. Then the following inequalityis used:</p><p>ln(<i>x</i>) ≤ <i>x</i> - 1</p><p>where equality occurs if <i>x</i> = 1 (see Figure 1.3).</p><p>After converting entropy to its natural logarithmic form, we obtain</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]</p><p>and if <i>x = Q<sub>i</sub>/P<sub>i</sub>,</i></p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)</p><p>As the coefficients <i>Q<sub>i</sub></i> are probability values, they fit the normalizing condition[summation]<sup><i>M</i></sup><sub><i>i</i>=1</sub> <i>Q<sub>i</sub></i> ≤ 1,and it is also true that [summation]<sup><i>M</i></sup><sub><i>i</i>=1</sub> <i>P<sub>i</sub></i> = 1.Then</p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (12)</p><p>If now the probabilities <i>Q<sub>i</sub></i> adopt equally likely values <i>Q<sub>i</sub></i> = 1/<i>M,</i></p><p>[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (13)</p><p>In the above inequality, equality occurs when log<sub>2</sub>(1/<i>P<sub>i</sub></i>)= log<sub>2</sub><i>(M)</i>, which means that <i>P<sub>i</sub> = 1/M.</i></p><p>The maximum value of the entropy is then log<sub>2</sub>(<i>M</i>), and occurs when all the symbolstransmitted by a given source are equally likely. Uniform distribution corresponds to maximumentropy.</p><p>In the case of a binary source (<i>M</i> = 2) and assuming that the probabilities of the symbolsare the values</p><p><i>P<sub>0</sub> = α P<sub>1</sub> = 1 - α</i> (14)</p><p>the entropy is equal to</p><p><i>H(X)</i> = Ω(α) = α log<sub>2</sub>(1/α)+ (1 - α) log<sub>2</sub> (1/1 - α) (15)</p><br><p>This expression is depicted in Figure 1.4.</p><p>The maximum value of this function is given when α = 1 - α, that is, α = 1/2, so that theentropy is equal to <i>H(X)</i> = log<sub>2</sub> 2 = 1 bps. (This is the same as saying one bit per binary digitor binary symbol.)</p><p>When α -> 1, entropy tends to zero. The function Ω(α) will be used to represent the entropyof the binary source, evaluated using logarithms to base 2.</p><br><p><i>Example 1.3:</i> A given source emits <i>r</i> = 3000 symbols per second from a range of foursymbols, with the probabilities given in Table 1.2.