It is in fact BSC. Such channels are not often encountered in praxis, but BSC can be used for modeling even for channels with memory as the basic building block. The next problem is to form an equivalent channel. Its matrix is obtained by multiplication of the corresponding transition matrices. Using induction method, it can be easily proved that in a general case the matrix of the equivalent channel is obtained by multiplication of matrices of channels is cascade.

For identical channels the Fig.

### Drajić, Dušan

Further, it can be proved as well that when the number of cascaded channels increases, the elements of equivalent transition matrix converge to 0. The channels can be extended as well as the sources by considering sequences of n channel symbols as the symbols of n-th extension. This model corresponds to a communication system with a feedback, and the receiver can send to the transmitter the information whether the received symbol is received correctly or not.

If the erased symbol is received usually denoted with E , the transmitter will repeat it until it is correctly received. Probability of this event is 1 — p, and the corresponding signaling rate shotdd be multiplied with the same factor. Consider discrete channel without memory described by transition matrix P with dimension r x. They can be called as well transition probabilities. Of course, the following must hold! Generally, on the basis of transition and output probabilities sometimes it is not possible to calculate input probabilities.

Consider the next trivial example. Of course, it is an extreme case. There are two sets of input probabilities. Of course, it is known at the receiving end. The a posteriori input probabilities can be calculated as a function of a priori input probabilities and the transition probabilities, i.

Sum of the last four probabilities equals one. Entropy, defined in Chap. However, after the reception of specific symbol yy, the a posteriori entropy of the set X is In fact, H X is an average measure per symbol of the receiver uncertainty about the emitted source symbols, before any symbol reception, while HQUyj is a measure of the remaining part of a posteriori uncertainty about the emitted symbol, when specific symbol yy is received.

It can be called partial a posteriori entropy. The conclusion is that this uncertainty can increase after the reception of some specific symbol. Now, it can be mathematically defined how the information is transmitted through the channel. Let the channel output is considered. Considering the last row in the above expression, it can be easily concluded that the mutual information will be equal to zero if the input and output symbols are statistically independent in pairs, i.

This conclusion verifies once more that the approach to definition the quantity of information is intuitively correct. When input and output symbols are statistically independent in pairs, i. On the other hand, the mutual information is maximal, if the a posteriori entropy equals zero, because then all emitted information is transmitted.

Besides the above defined entropies, three more can be defined. Let input a priori 1. Now, the last step to define the channel capacity follows. This means that mutual information depends on the way the channel is used as well—which source is at the channel output. However, the quantity should be found, depending only of the channel. The number of input symbols is fixed r and such sources can only differ by symbol probabilities.

The engineers prefer to know the maximal quantity of information that can be transmitted over the channel in one second. Some authors especially in the courses on probability call 7 max channel capacity. However, the symbol rates are in a range from a few thousands symbols per second to more terasymbols per second. In reality, the information rate flux Problems 4.

Here only two interesting case will be given. Brief Theoretical Overview Consider a discrete channel where the number of symbols is r. However, further hypothesis is that there is no noise in the channel noiseless channel. However, after the receiving of any symbol there would not be any ambiguity, because there is no noise nor intersymbol interference. According to previous discussion, here the entropy is maximum when the symbols are equiprobable, i. A very important channel is so called continuous channel with additive noise. Here the source is continuous, the channel is continuous and the output being sum of signal and noise is continuous as well.

Te corresponding entropy is defined in Chap. Brief Theoretical Overview Now, a hypothesis about the noise its probability density should be made. The worst case, as explained in Chap. The output signal, sum of the input signal and noise statistically independent , will have the maximal entropy if its probability density is Gaussian.

To obtain it, the input signal should be Gaussian as well because the sum of the independent Gaussian processes is Gaussian process. When applying the above expression for the channel capacity it is very important to keep in mind the conditions supposed.

## Information Theory and Coding - Solved Problems

The channel is considered with additive Gaussian noise, a capacity is achieved if the input signal is Gaussian as well. Therefore, it is maximum possible value for capacity. In reality, if the signals are used which are not Gaussian usually some kind of baseband pulses or some kind of digital modulation , the capacity is smaller.

In such cases, a calculation of capacity can be very complicated. The notion of error probability is very well known. Still, it is useful to connect it with so called decision rule.

- Stochastic processes: selected papers of Hiroshi Tanaka.
- Navigation menu.
- Huffman coding?
- Requirements?

Consider BSC described by transition matrix '0. Of course no! Therefore, the error probability does not depend on transition probabilities only, but on the decision rule as well—on the way the receiver interprets the received symbol. It should be noted that here binary channel the worst case is when the error probability is 0. Now, the notion of decision rule will be more precisely defined. Decision rule D yJ is a function specifying a unique input symbol x t for each output symbol y P i.

Generally, there are r s different decision rules.

## Zero-error information theory - IEEE Journals & Magazine

The next question is which decision rule one should choose? The answer is clear— that one yielding minimum error probability. Therefore, the relation between decision rule and error probability should be found. Let symbol y, is received. Of course, if there are more such symbols, anyone can be chosen. To apply this rule the a posteriori probabilities P xjyj should be found. Simply, for every received symbol yj the decision is that symbol x, is emitted for which in the corresponding column in matrix the conditional probability has a maximum value. Minimal error probability will be obtained if all input symbols are equiprobable.

The above described decision process can be called as well the hard decision Problems 4. This means that the final decision about the received symbol is made in demodulator detector. However, it is better to quantize the output signal, i. In this case, it is said the soft decision Problems 3. The drawback is that more bits should be used. In fact, in digital transmission, the channel input is discrete. By noise and other interferences superposition, the channel output is continuous. The thresholds positions influence the error probability.

These positions should be chosen to minimize the error probability. Up to now, channels without memory are considered. As said earlier, in the channel with intersymbol interference, the error probability of the received symbols depends as well and on the adjacent symbols in such a way introducing memory into the channel. It is obvious that channel without intersymbol interference with additive Gaussian noise can be modeled as BSC. However, in such channels sometimes 4 Information Channels Fig. Gilbert [20] proposed a simple model. In every state channel is considered as BSC where only one parameter is sufficient to describe the channel.

State diagram is shown in Fig. There are three independent parameters, p L and two transition from state to state probabilities other two are then fixed. The average duration of time intervals sojourn time in good state corresponds to the intervals without the impulsive noise without the errors in Gilbert model. The average duration of time intervals in bad state corresponds to the intervals with the impulsive noise. Now, the time came to consider the trough meaning of the channel capacity. It is given by Second Shannon theorem. Its complicated proof here will be omitted.

This result is unexpected. The reliable transmission over the unreliable channel is possible! It should be noted that the transmission without any error is not possible. It only means that the error probability can be kept as low as one wish. This goal can be achieved using error control coding considered in details in the next five chapters. To get some insight, an elementary example will be done here—error control coding using repetitions. Consider binary memoryless binary source emitting symbols 0 and 1.

Usually it is high error probability. That means that three channel bits are used for every information bit. Of course, the corresponding information rate is three times smaller i. At the receiving end the majority logic is implemented. Three bits are decoded according to the greater number of ones or zeros as one or zero. It is in fact MAP criterion. Therefore, single errors in a code word are detected and corrected. The number of repetition can be further increased P e ss 4 x 10 7 for sevenfold repetition and P e m 10 8.

Corresponding results are shown using fat points in Fig. Note the reciprocal values at the abscissa. By considering this figure, it would be easy to understand the essence of error control coding and the corresponding problems. Therefore, the logical question is: does exist some way to decrease the error probability, without such decrease of code rate?

Second Shannon theorem just gives a positive answer. Problems Problem 4. The corresponding channel is described by transition matrix 0. Problems Solution a Entropy and information rate are the source characteristics. Finding P. Simply, this rule leads to the choice of input symbol for which the greatest a posteriori probability of output symbol is obtained.

However, the decision rule must provide that for the different output symbols, the decision must be that different input symbols are emitted. In this case the decision can be based on the a posteriori probabilities, or one have to choose the rule giving a smaller error probability, i. Problem 4. For the same values of p find the decision rule yielding the minimum error probability as well as the probability of error in this case.

Draw the dependence of the error probability as a function of p for the cases when the maximum a posteriori probability MAP rule is used and when maximum likelihood ML rule is used. The transmitted information as a function of p is shown in Fig. It is obvious that the minimum error probability Fig. Solution a The transmitted information for binary asymmetric channel with transition matrix P is Problems Fig. The maximum is 4 Information Channels Fig. Channel consisting of N identical sections. The receiver making decisions on the one sample basis, the threshold is put to zero; 5.

The user receiving the reconstructed bit sequence. How it can be achieved? How to achieve the capacity in this case? Solution a System block-scheme is shown in Fig. The maximum signaling rate was chosen so as to avoid intersymbol interference [22], i. Channel capacity can be used in full if at the channel input the source emits equiprobable symbols with twice greater rate than the earlier source. If the line encoder and decision block are removed, a continuous channel with noise is obtained. In this case, the process at the channel linear system output has the Gaussian distribution as well.

In the next problem, it will be shown that by applying a different decision rule at the receiver, this difference could be made smaller. Information source emitting symbols Ji, Binary encoder performing Huffman coding. Channel consisting of two identical sections. Each one can be considered as a binary symmetric channel.

Total error probability is 0. Find the probability of ones and zeros at the encoder output. On the basis of the five given probabilities, the probability of the sixth symbol is Problems Fig. S'j, what is, of course, impossible because the encoder cannot introduce the new information. This illogical result is a consequence of the assumption that the sequence at the encoder output is uncorrelated, what is not true in fact.

Therefore, the capacity calculation is carried out by maximizing the transmitted information by the choice of the symbol probabilities at the channel input. It can be noticed that for the information transmission a more unreliable channel can be used if the encoder efficiency is smaller. In the case when the encoder is directly at the channel input, the redundancy introduced by the encoder allows the reliable transmission over the unreliable channel.

In other words, without the redundancy, the information without any distortion can be transmitted over the completely reliable channel only. This feature will be used in theory of error control codes where the redundancy will be intentionally introduced to make possible the information transmission over very unreliable channels. On the receiving end is the error control decoder using majority logic decision.

Calculate the transmitted information over the equivalent channel. It is obvious that the error probability depends only on the channel characteristics for a fixed decision rule , while the transmitted information depends on the source features parameter a as well. The reader should calculate information rate through the equivalent channel encoder, channel and decoder. Whether it can be increased by the increase of n, if the transmission bit rate through the channel is limited?

It should be noticed that the signaling rate in this channel is Vj! This time can be decreased if a sufficient source extension is done, after that by the use compression encoding and after the repetition encoding as well. On the other hand. Information source emitting the symbols sq, Binary Huffman encoder and the corresponding decoder at the receiving end. Error control encoder using threefold repetition and the corresponding decoder with majority logic at the receiving end these blocks are included optionally.

Problems Find the following: a Source information rate and code efficiency for Huffman encoding b Find the information rate of the equivalent source at the channel input and information rate of the source adjoined to it draw the system block-scheme c Channel information rate and the error probability without the error control coding. Specially comment how the error control coding influences the information rate! It is obvious that the information rate at the channel input i. Table 4. Otherwise, the error will occur for binary zero transmission.

If the binary rate in the channel cannot be increased the channel is the same information rate at the error control coder must be decreased three times, i.

- The Stories Children Tell: Making Sense Of The Narratives Of Childhood?
- Information Theory and Coding - Solved Problems | Ebook | Ellibs Ebookstore?
- Chapter Information Theory and Source Coding | Engineering.
- Fiscal Federalism and European Economic Integration (Routledge Studies in the European Economy);
- Similar titles!
- Schaum's Outline of Theory and Problems of Analog and Digital Communications, Second Edition?
- The Fear of Barbarians: Beyond the Clash of Civilizations;

It is also desirable that the signaling rate is as small as possible for a given information rate. By source encoding a non-destructive compression is achieved and the difference between these two rates is smaller in such a way demands for a frequency band are partially lessened. Solution Signal amplitude probability density function at the channel output depends on voltage level emitted by line encoder. It is shown in Fig. If the received signal is between thresholds, the decision would not be sufficiently reliable.

In this case the symbol is considered as erased, i. It is obvious that for every value of P x t , there are the Fig. The dependence of the transmitted information on the input signal probabilities is shown in Fig. From line encoder the signal is emitted over the channel with additive white Gaussian noise standard deviation is a. Solution a Because of the quantizer, the complete voltage range is divided into s non-overlapping intervals every corresponding to one discrete symbol.

It is obvious that with the increased number of quantization levels, the transmitted information increases as well, but after some number of levels, the saturation is achieved. The corresponding numerical values are shown in Fig. It is obvious that for all signal-to-noise ratio values the increasing of the number of quantization levels increases the transmitted information as well.

In such a way it is practically confirmed that soft decision is superior to hard decision where the received signal is compared to one threshold value only. It is interesting to find as well the capacity of the continuous channel with noise, the channel corresponding to the system part from line encoder output to the quantizer input. Problems Although the frequency band is not given in this problem, it is obvious that the signaling is in the baseband and the condition v miiy JX.

Of course, it is obvious that the continuous channel capacity is always greater than that of the discrete channel adjoined to it. However, it should be noted that the transmitted information of a discrete source channel cannot be greater than the entropy of the input discrete source. Its n- th order extension is led at the line encoder input. At the channel output are the quantizer M levels and the corresponding decision block. Find the entropy of the equivalent source. The dependence of transmitted information on the signal-to-noise ratio in the channel is shown in Fig.

When the number of discrete symbols at the input and at the output of the channel increases, the discrete channel capacity could converge to the continuous channel capacity, seeming not being here completely the case. The basic problem of the above analysis is that is it limited to equiprobable input symbols only. It should be noticed that in this case the channel is not completely symmetric, which can be easily seen from the graph or from the transition matrix.

It implicates that it is optimally to choose the probabilities P2 Pi Table 4. Each of this probabilities should be in the range from 0 to 0. It is logical—for a strong noise more distant levels should be used to reduce the noise influence. Of course, an additional gain is possible if for a fixed number of signaling levels at the channel input, an additional quantization is performed at the receiving end in such a way that the number of symbols at the output is greater than at the input, i.

The bandwidth is chosen so as to eliminate the intersymbol interference obtaining in the same time the minimum noise power at the channel output. The BPSK signal receiver is using integrate and dump circuit. By which discrete model the corresponding channel error sequence can be modeled? Find the channel average error probability. Solution The channel is modeled as an ideal pass-band filter.

It can be easily found 4 Information Channels Fig. The corresponding error sequence it could be obtained by adding modulo 2 sequences of transmitted and received bits is Problems For this reason, the normalization is carried out, to allow a more noticeable difference in the function shape. Of course, in the second chapter of this book, it was shown that the discrete sequence autocorrelation function must be discrete, but in this case, to allow a clearer insight, only the envelope of the function is drawn.

Chapter 5 Block Codes Brief Theoretical Overview In the remaining part of this book error control codes will be considered. As said earlier, these codes enable detection and possible correction of transmission errors. Usually, it is supposed that at the encoder input there is a series of bits, statistically independent and equally probable. It can be a result of previous data compression or scrambling. To detect and correct the errors some redundancy should be added. Depending on the way how the redundancy is added, the error control codes are divided into two families— block codes and convolutional codes.

Block codes are invented before convolutional codes. These last will be considered in the next chapter. From the name it is obvious that block encoding consists of taking a block of input bits k bits and representing it with the block of output bits n bits. This code is denoted as n , k code. This table will have 2 k rows. For greater values of k it is not practical and some rule should be used how to obtain the code word from k input bits.

Further, the corresponding decoding rule is needed—how to obtain information bits from the received code word. In praxis the systematic codes are especially interesting, where the block comprising k information bits is not changed in the code word, and n - k control bits are added, forming a code word of the length n. In the following, it is primarily supposed that binary codes are considered. In this chapter will be considered primarily so called linear block codes. At the beginning, some elementary notions will be introduced. Repetitions codes Problem 5. At the end of Sect.

Here, only the fivefold repetition will be analyzed—i. Various decoding strategies can be used. There are 32 possible received words. According to one decision rule all rules are MAP, with majority logic implemented. Five bits are decoded according to the greater number of ones or zeros as one or zero. Therefore, all single errors and double errors in a code word are detected and corrected.

The feedback channel is not needed. According to the next decision rule only and are decoded as 0 or 1. In this case all single, double, triple and fourfold bit errors are detected. But, the third decision rule is possible as well. Besides the combination , no errors! For the other combination having 2 or 3 zeros either 2 or 3 ones, the retransmission is requested. It means that single errors are corrected, and all double and triple errors are detected. It is a hybrid procedure. In fact, these rules are based on Hamming distance d Problems 5.

For binary codes Hamming distance between two binary sequences of the same length is defined as the number of places bits in which they differ. It has the properties of metric. A discrete five dimensional space can be conceived, having total of 32 points, two of which are code words. Their distance is 5. According to the first decision rule only the received words identical to code words are decoded, for the other cases, the retransmission is requested ARQ , It is illustrated in Fig. According to the second decision the received word is decoded as the nearer word in hamming sense FEC as in Fig.

For the third decision rule hybrid procedure , the space is divided into three subspaces. The points in the subspaces around code words are at the distance 1 from code words and single errors are corrected. In the third subspace in the middle are the points having Hamming distance 2 or 3 from code words and double and triple errors are detected Fig.

Decoder which every received word decodes as some code word final decision is called complete decoder. Therefore, it is easy to find the direct connection of Hamming distance and the numbers of detectable and correctable errors of some code. Of course, for a code with more than two code words, the worst case should be considered, i. Codes using single parity check are probably the oldest error control codes. The total number of ones is even. These codes are described and analyzed in details in the mentioned problems. Therefore, this code can correct all single errors, but not any combination of two errors in the code word.

A code that, for some e , can correct all combinations of e errors or less and no others is called a perfect code Problem 5. Besides the Hamming codes, it is known that the Golay code 23, 12 is a perfect one, it can correct all single, double or triple errors. Perfect codes satisfy with equality the Hamming bound Problems 5. There is also a Singleton bound Problem 5. Any code satisfying that bound with equality is called a maximum-distance code MDS Problem 5. Varshamov-Gilbert bound Problem 5. The obtained code can detect practically all even numbers of errors.

A code can be shortened as well by omitting some information symbols. By adding a general parity check, the codes 8, 4 , 16, 11 , 32, 26 , 64, 57 etc. It is very suitable having in view the byte data structure. In a previous consideration it was mentioned that generally any block code can be defined by the corresponding table comprising tuples of input information bits and the corresponding n-tuples of output encoded bits code words.

Therefore, there are in total possible codes of course, some of them are not good! For the greater values of n and k it is not practical and some rule should be used how to obtain the code word from k input bits. It means that some simple rule should be defined. With the help of discrete mathematics abstract algebra apparatus such class was found. In such a way it is possible to construct codes having needed characteristics Hamming distance, etc. These codes are linear block codes.

This class of codes is defined by imposing a strong structural property on the codes. This structure provides guidance in the finding the good codes and helps to practical encoders and decoders realization. At the end of this chapter a short overview of the corresponding part of abstract algebra is included. The definition of linear block code is very simple. Consider finite field—Galois field GF q , i. In the field two operations are defined— addition and multiplication both are commutative.

Sequences of n field elements vectors form a vector space V dimension n over the field. In this space vector addition is defined where the corresponding vector elements are added according to the rules from field. The set V is a commutative group under vector addition. Scalar multiplication is defined as well where vectors i. If inner product equals zero 0 from the field , for the corresponding vectors it is said to be orthogonal.

Definition of linear code is as follows: A linear code is a subspace of vector space over GF q. In this chapter mainly binary field GF 2 will be considered, but the theory is general, it can be applied for linear codes with q different symbols, if these symbols can be connected with the symbols from GF q. It is very convenient that for any exponent of 2 there exists a linear block code.

This approach in describing codes is often called algebraic coding theory. Of course, block codes can be formed for any number of symbols, but then they are not linear. The vector spaces over the finite fields can be used to define code words not as the elements of a subspace. Also, at the end of this chapter there are two problems not connected to linear block codes. The arithmetic block codes Problem 5. These relations are usually very simple. For integer codes Problem 5. From these candidates 2 k code words should be chosen. If these words are chosen as a subspace of the considered space, a linear block code is obtained.

The identity element is 0 0 , 0, The subgroup is closed under addition and the sum of code words will be a code word as well. The Hamming distance between two code words vectors equals the Flamming weight i. It further means that for any two code vectors there exists code vector obtained by their summation and the Hamming weights of code vectors are in the same time possible Hamming distances in the code.

Therefore, to find the minimum Hamming distance in the linear code, the code word having the minimum Hamming weight should be found of course, not the all zeros code word! It specifies the number of code words that have the same Hamming weight. Consider Hamming code 7, 4 Hamming code is a linear code! Corresponding weight spectrum is: 0 1 3 7 4 7 7 1 , shown in Fig. A set of vectors is said to span a vector space if every vector equals at least one linear combination of these vectors.

The number of linearly independent vectors is called a vector space dimension. Any set of linearly independent vectors spanning the same vector space is called the basis. Therefore, there can be more bases of the same vector space. Generally, the group can have the nontrivial subgroups. Similarly, the vector space can have the subspaces. The subspace dimension is smaller than the space dimension. For n , k code vector space dimension equals n and code subspace dimension equals k. The subspace can have more bases as well.

### What is Kobo Super Points?

The subspaces can be orthogonal dot products of the vectors from bases equal zero. Consider n, k code. Vectors of its base can be ordered in a matrix form, its dimensions are k x n. It is a generator matrix Problems 5. Its rows are linearly independent. The matrix rang is k. Therefore, in this case the code will be the same, only the base would change.

Further, by commuting the matrix columns, an equivalent code Problems 5. This code will have the same weight spectrum as the previous one, i. Consider linear block code 5, 2. Code words are , , i , i. Such mathematical description provides very simple generation of code words. Therefore, a code word must be obtained. Further, the rows of G are linearly independent and to any specific information vector i corresponds code word v, different from other code words. This number is 2 k - 1. Dust jacket and book are in very good condition.

Exhibition checklist laid in loose. Barbri Bar Exam conviser mini review. Publisher: Barbri for February bar exam review course. Barbri California Essay Testing. Best Wishes with your Bar Exam preparations. The cover shows wear. The cover has creases. The pages show wear that may include evidence of handling, smudges, some folds, and edge discoloration. In some instances, stock cover image shown may not match actual book cover. Fast Shipping - Safe and Secure!

Results Pagination - Page 1 1 2 3 4 5 6 7 8 9 Shop by Publication Year. See All - filter options. Shop by Subject. Hot This Week. Publication Manual of the American Psychological A. Sponsored Listings. Japanese Graded Readers Level0 Vol. USB Stick! Lots Of Files! Make an Offer. Make Offer - [P. Make Offer - ccnp switching questions and solutions. Bjy The American Republic Tests. Vocabulit Teachers Manual Book G. A Beka Worlds Of Wonder. Got one to sell? Classical Conversations. Abeka 1st Grade.

Abeka Biology. Saxon Algebra 1 Textbooks. Saxon Algebra 2. Saxon Math 3. Algebra 2 Teaching Textbooks. Math U See Alpha. All About Reading. Math U See Gamma. Apologia Biology. Teaching Textbooks Algebra 1. Apologia Chemistry. Memoria Press. As waves radiate outward when a stone is dropped into water, so music has a natural immutable form that cannot be changed; it arises from the initial disturbance of sound, just as the waves in the pond do.

So the music is simply playing itself out recursively , and the job of the composer is to find the "sweep of improvisation" that plays out the initial disturbance. His central concept is the "chord of nature"; he taught that all music arises from and falls back into this chord, just as a feedback version of fourier series and interference patterns arises from and then recedes back into a Chinese gong when it is struck.

The chord itself is based on physics and is well known to scientists. But here we see an example of coding theory. That is, if the music is already there, and is simply moving back to equilibrium, then it is really looking for the most efficient path to that chord. It should not waste any energy, so to speak, and this is what gives it its beauty and natural artistry, just as rain or snow has a natural artistry. So the most profound music is an encoding of sound information that matches the Shannon entropy.

There is probability inherent in this in the sense that sound can be thought of as brownian motion, and the waves as global effects of the brownian motion. Note that there is no error correcting in music though. If you miss it, you miss it. It would be interesting to see what Schenker would have said about communication in the presence of lots of other radios playing, horns honking, and noise in general. Perhaps he would have come up with a theory of Music in the Presence of Noise, just as Shannon did for other types of information.

One author asked the question "why does Copernicus seem more right than the theorists that went before him? He concluded that the human race feels that Copernicus is more right because his ideas are simpler - the key is simplicity. So one could say that he found a more efficient encoding of the information of the solar system, and that the human race is passionate about coding theory without even realizing it! One can argue that what gives these statements their force is efficiency of expression; they are an interpolation of many feelings, attitudes and perceptions; they are an efficient encoding of emotional and mental information.

At the time of this writing cell phones are important. But cellphone text messaging costs money. So how do you minimize costs? In some third world nations, people are actually developing elaborate languages for data transmission that help them save money. For instance, one may encode the message "talk to you at 7 o'clock" as "tk 2 u 7" and so on. This is effective in the situation that Claude E. Shannon described as Communication Without Noise. That is, as long as the message is properly received, no harm is done.

But Shannon also stated that because of the presence of noise, some redundancy may be important. He felt that this occurred by the natural function of the mind, the purposeful addition of excessive information that allows for error correction - by the listener or reader. Animals also encode information efficiently in various ways. Chirps, squawks, growls, tweets, meows, woofs and even elephants' pounding on the ground in order to communicate with other elephants far away; these are all examples of encodings of information that attempt to be as efficient as possible.

At this point an interesting question about cellphone technology becomes apparent: Is there one most optimal encoding method for cellphone communication? In order to answer that effectively, one would have to average all messages sent in a region for example , then translate the result into binary, then construct a stochastic process essentially a computer that sends an average message in terms of probabilities of various patterns occurring that is essentially equivalent to those messages.

At that point the entropy could be calculated and a matching code developed. But the lingering problem is this: given a communication pattern a stochastic process , how can you generate an efficient encoding scheme? In summary, one is looking for a black box that receives a stochastic process and produces a code.

## Information theory and coding - solved problems

One can design a neural network that can essentially writes programs by itself. It is possible to train a network on a large number of randomly chosen stochastic processes, while providing an increased reward for producing a more efficient code, and a decreased reward for producing a less efficient code. In many situations, the entropy can almost certainly be approximated.

Neural networks are interpolations of data. As they learn the pattern they can be more finely tuned and so on. However this is not the fun kind of interpolation that may be taught in high school as Lagrange interpolation ; although an interesting game, this method involves usually a single variable and a few data points. On the other hand, neural networks can be quite intense, in the sense that the number of variables can be very large, and the number of data points can run into the millions.

Therefore, there is no way of using ordinary methods to interpolate that kind of data. Proponents of differential geometry have proposed that neural networks should be studied carefully in order to produce a solid mathematical theory that can be used to predict their behaviour exactly in every situation. But the task is very complex, and almost no hope of achieving the goal exists at the present time. So the general study remains as more of an experimental science.