Purchase A Course in Probability Theory - 2nd Edition. Authors: Kai Chung . Kai Lai Chung is a Professor Emeritus at Stanford University and has taught. BOOK REVIEWS. A course in probability theory, by Kai Lai Chung. Harcourt, New . York, xiii+ pp. $ Probability, by Leo Breiman. Addison-Wesley. Of course basic knowledge of the sun and moon and some stars was Professor Chung has made important contributions to probability theory, particularly to.

Author: | ANAMARIA CODDING |

Language: | English, Spanish, Hindi |

Country: | Panama |

Genre: | Fiction & Literature |

Pages: | 344 |

Published (Last): | 26.05.2016 |

ISBN: | 288-1-57759-263-8 |

ePub File Size: | 22.52 MB |

PDF File Size: | 19.47 MB |

Distribution: | Free* [*Regsitration Required] |

Downloads: | 29024 |

Uploaded by: | VINNIE |

A COURSE IN PROBABILITY THEORY THIRD EDITION KalLalChung Stanford University f\.CADEl\lIC PRESS --" po: il1r({"Jurt SCIence and Technology. Chung K.L. a Course in Probability Theory (3ed., AP, ) - Free ebook download as PDF File .pdf), Text File .txt) or read book online for free. A Course in Probability Theory-kai Lai Chung - Ebook download as PDF File . pdf) or read book online.

We shall call the function Or the point mass at t. If Xn"! In later years I prepared a set of notes, lithographed and distributed in the class, to meet the need. This is a generalization of vi. The LV. The intersection of any collection of B. Use Exercise 23 of Sec.

The number of atoms of any o-finite measure is countable. This is the case if and only if F is discrete. This is the case if and only if F is singular. One half is proved by using Theorems 1.

Translate Theorem 1. Translate the construction of a singular continuous d. What if the latter has positive measure? Describe a probability scheme to realize this.

Show by a trivial example that Theorem 2. Show that Theorem 2. Show that the: Let st, 27,. Prove that there exists a minimal B. Y2, where. Suppose that F has all the defining properties of a d. What modification is necessary in Theorem 2. For an arbitrary measure? Prove that for a measure f. Prove that if the p. Prove first that there exists E with "arbitrarily small" probability. A quick proof then follows from Zorn's lemma by considering a maximal collection of disjoint sets, the sum of whose probabilities does not exceed ct.

But an elementary proof without using any maximality principle is also possible. A point x is said to be in the support of a measure f. The set of all such points is called the support of u, Prove that the support is a closed set whose complement is the maximal open set on which f. Show that Let f be measurable with respect to g;, and Z be contained in a null set.

Show that the conclusion may be false otherwise. A real, extended-valued random variable is a function X whose domain is a set. A complex-valued random variable is a function on a set. This definition in its generality is necessary for logical reasons in many applications, but for a discussion of basic properties we may suppose. This restricted meaning 3. Condition 1 then states that X-I carries members of 9'31 onto members of Such a function is said to be measurable twith respect to S'T.

Thus, an r. The next proposition, a standard exercise on inverse mapping, is essential. Theorem 3. The preceding condition may be written as 3 'v'x: X-1 CC-oo, x] E Y.

From Theorem 3. This B. Thus X is an r. This proves the "if" part of the theorem; the "only if' part is trivial. The next theorem relates the p. Each r. I, , J-L by means of the following correspondence: If the Bn's are disjoint sets in: Hence n n Finally X-I?!? Thus J-L is a p. It is the smallest Borel subfield of? Thus 4 is a convenient way of representing the measure: This J-L is called the "probability distribution measure" or p. F according to Theorem 2. While the r. X determines jL and therefore F, the converse is obviously false.

A family of r. Let n, J be a discrete sample space see Example I of Sec. Every numerically valued function is an r. CW, 03, m. In this case an r. The two r. The definition of a Borel measurable function is not affected, since no measure is involved; so any such function is an r.

As in Example 2, there exists an r. We proceed to produce new r. The quickest proof is as follows. Regarding the function f X of w as the "composite mapping": We must now discuss the notion of a random vector. This is just a vector each of whose components is an r. It is sufficient to consider the case of two dimensions, since there is no essential difference in higher dimensions apart from complication in notation.

A fortiori, it is also generated by product sets of the fonn where HI and H2 belong to. A function from 9 2 into 9'l'1 is called a Borel measurable function of two variables iff r: Jill C d Written out, this says that for each I-dimensional Borel set E, viz.

Now let X and Y be two r. The random vector X, Y induces a probability v on qp as follows: This v is called the 2-dimensional, probability distribution or simply the p. Let us also define, in imitation of X-1, the inverse mapping X, y -1 by the following formula: This mapping has properties analogous to those of X-1 given in Theorem 3. We can now easily generalize Theorem 3.

The last inclusion says the inverse mapping X, y -1 carries each 2- dimensional Borel set into a set in. This is proved as follows. Now the collection of sets A in! It follows from what has just been shown that this B. J6 hence it must also contain: Hence each set in: Here are some important special cases of Theorems 3.

Throughout the book we shall use the notation for numbers as well as functions: Generalization to a finite number of r. Passing to an infinite sequence, let us state the following theorem, although its analogue in real function theory should be well known to the reader. To see, for example, that SUPjXj is an r. Here already we see the necessity of the general definition of an r. It is easy to see that X is discrete if and only if its d. Perhaps it is worthwhile to point out that a discrete r.

Consider, for example, an r. The following terminology and notation will be used throughout the book for an arbitrary set Q, not necessarily the sample space. For each. C Q, the function It. We have then More generally, let b, be arbitrary real numbers, then the function tp defined below: Each discrete T.

X belongs to a certain partition. If j ranges over a finite index set, the partition is called finite and the r. Prove Theorem 3. For the "direct mapping" X, which of these properties of X-I holds? If two r. Given any p. Let e be uniformly distributed on [0,1]. For each d. Then G e has the d. Suppose X has the continuous d. F, then F X has the uniform distribution on [0,1]. What if F is not continuous? Is the range of an r. The sum, difference, product, or quotient denominator nonvanishing of the two discrete r.

If Q is discrete countable , then every r. Conversely, every r. Use Exercise 23 of Sec. If f is Borel measurable, and X and Y are identically distributed, then so are f X and f y. Is this B unique? Can there be a set A Generalize the assertion in Exercise 11 to a finite set of r. The reader is supposed to have some acquaintance with this, at least in the particular case 11,:: The r.

For each positive discrete r. It is trivial that if X belongs to different partitions, the corresponding values given by 1 agree. For each m, let Xm denote the r. Consequently there is monotone convergence: It should be shown that when X is discrete, this new definition agrees with the previous one.

We say X has a finite or infinite expectation or expected value according as d: In the expected case we shall say that the expectation of X does not exist. The expectation, when it exists, is also denoted by in X w. More generally, for each A in. This classical notation is really an anachronism, originated in the days when a point function was more popular than a set function.

Here m is atomless, so the notation is adequate and there is no need to distinguish between the different kinds of intervals. The general integral has the familiar properties of the Lebesgue integral on [0,1].

We list a few below for ready reference, some being easy consequences of others. As a general notation, the left member of 4 will be abbreviated to III X d?

In the following, X, Y are r. If the An's are disjoint, then iv Positivity. If Xl I X2 d9. On A, then a9 A IX" I If then L: If X; Now the partial sums of the series on the left may be rearranged Abel's method of partial summationl to yield, for N J IXI Jl IXI P lXI Let X Then the set function v defined on Y; as follows: P, is a probability measure on 7. Y on A with fA Y dPf!

Deduce from Fatou's lemma: JA " Show this is false if the condition involving Y is omitted. Given the r. X E see the end of Sec. Hence there exists a sequence of simple r. IXI for all m. Y ; call the resulting metric space M sr,: Prove that for each integrable r. X the mapping of M S? A2 are all continuous.

If see Sec. Deduce Exercise 2 above as a special case. There is a basic relation between the abstract integral with respect to. We give the version in one dimension first. Let X on n,.

Then we have 11 in! X Ii dx provided that either side exists. Let B E. They are equal by the definition of J. This proves the theorem for j We shall need the generalization of the preceding theorem in several dimensions.

No change is necessary except for notation, which we will give in two dimensions. Instead of the v in 5 of Sec.

L2 dx, dy. A Theorem 3. Let X, Y on st,: J2, J. L 2 and let j be a Borel measurable function of two variables. As a consequence of Theorem 3. Lx and Fx denote, respectively, the p.

L2 be as in Theorem 3. This result is a case of the linearity of 0 given but not proved here; the proof above reduces this property in the general case of Q,: Such a reduction is frequently useful when there are technical difficulties in the abstract treatment.

We end this section with a discussion of "moments". L and F are, respectively, the p. The moments about the mean are called central moments. That of order 2 is particularly important and is called the variance, var X ; its positive square root the standard deviation.

We note the inequality J"2 X It is a special case of the next inequality, of which we will sketch a proof. Jensen's inequality. Convexity means: We shall prove 22 for a simple r.

Let then X take the value Yj with probability Aj, 1: Then we have by definition 1: AjCP Yj. Finally, we prove a famous inequality that is almost trivial but very useful.

Chebyshev inequality. We have by the mean value theorem: Another proof of O and Y For any d. Thus if X is a positive r. Use Exercise 17 to express the mean of the maximum. They are said to be pairwise independent iff every two of them are independent. Note that 1 implies that the r. On the other hand, 1 is implied by the apparently weaker hypothesis: In terms of the p.

JL" induced by the random vector X , Jen, Y3" , and the p. Finally, we may introduce the n-dimensional distribution function corresponding to JL ", which is defined by the left side of 2 or in alternative notation: F Xl, Xj, By Theorem 3.

The proof of the next theorem is similar and is left as an exercise. Let We give two proofs in detail of this important result to illustrate the methods. First proof Suppose first that the two r. Since X and Yare independent, we have for every j and k: Thus 5 is true in this case. Now let X and Y be arbitrary positive r. Then, according to the discussion at the beginning of Sec.

Furthermore, for each m, X", and Y", are independent. Note that for the independence of discrete r. Finally, it is clear that XmYIIl is increasing with m and o: For the general case, we use 2 and 3 of Sec. This again can be seen directly or as a consequence of Theorem 3. Hence we have, under our finiteness hypothesis: The first proof is completed.

Second proof. Consider the random vector X, Y and let the p. Then we have by Theorem 3. X t Y , finishing the proof! Observe that we are using here a very simple form of Fubini's theorem see below. Indeed, the second proof appears to be so much shorter only because we are relying on the theory of "product measure" f. L2 on CA2, jj2. This is another illustration of the method of reduction mentioned in connection with the proof of 17 in Sec. A rigorous proof of this fact may be supplied by Theorem 3.

Do independent random variables exist? Here we can take the cue from the intuitive background of probability theory which not only has given rise historically to this branch of mathematical discipline, but remains a source of inspiration, inculcating a way of thinking peculiar to the discipline. It may be said that no one could have learned the subject properly without acquiring some feeling for the intuitive content of the concept of stochastic independence, and through it, certain degrees of dependence.

Briefly then: If an unbiased coin is tossed and the two possible outcomes are recorded as 0 and 1, this is an r. Repeated tossing will produce a sequence of outcomes.

If now a die is cast, the outcome may be similarly represented by an r. Next we may draw a card from a pack or a ball from an urn, or take a measurement of a physical quantity sampled from a given population, or make an observation of some fortuitous natural phenomenon, the outcomes in the last two cases being r.

Now it is very easy to conceive of undertaking these various trials under conditions such that their respective outcomes do not appreciably affect each other; indeed it would take more imagination to conceive the opposite! In this circumstance, idealized, the trials are carried out "independently of one another" and the corresponding r. We have thus "constructed" sets of independent r. Can such a construction be made rigorous?

We begin by an easy special case. The product B. Since Qn is also a countable set, we may define a p. This p. It is trivial to verify that this is indeed a p. Furthermore, it has the following product property, extending its definition 7: Let 'll" be the n-dimensional cube immaterial whether it is closed or not: J and m" the usual Borel field and measure, is a probability space.

The p. The reader may recall the term "independent variables" used in calculus, particularly for integration in several variables. The two usages have some accidental rapport. The point of Example 2 is that there is a ready-made product measure there.

Xi" it is possible to construct such a one based on given p. It remains to extend this definition to all of 28n, or, more logically speaking, to prove that there exists a p. The situation is somewhat more complicated than in Example 1, just as Example 3 in Sec.

Indeed, the required construction is exactly that of the corresponding Lebesgue-Stieltjes measure in n dimensions. This will be subsumed in the next theorem. Assuming that it has been accomplished, then sets of n independent r. Can we construct r. The simplest case will now be described and we shall return to it in the next chapter. This expansion is unique except when x is of the form ml'I"; the set of such x is countable and so of probability zero, hence whatever we decide to do with them will be immaterial for our purposes.

For the sake of definiteness, let us agree that only expansions with infinitely many digits "1" are used. Hence they are r. Then the set n x: It is clear that this set is just an interval of length 1 12n, hence of probability 1 12n. On the other hand for each i. This example seems extremely special, but easy extensions are at hand see Exercises 13, 14, and 15 below. We are now ready to state and prove the fundamental existence theorem of product measures.

Let a finite or infinite sequence of p. There exists a probability space Q,? P and a sequence of independent r. Without loss of generality we may suppose that the given sequence is infinite. Why" For each n , let Qn,: XII with fLn as its p.

Indeed this is possible if we take Qn, 3;;,. Exercise 3 of Sec. Let the collection of subsets of Q, each of which is the union of a finite number of disjoint finiteproduct sets, be 3t. It is easy to see that the collection giQ is closed with respect to complementation and pairwise intersection, hence it is a field.

We shall take the? T in the theorem to be the B. This;j6 is called the product B. We define a set function 9 on. Next, if E E. O and n where the E k ,S are disjoint finite-product sets, we put II In order to verify countable additivity it is sufficient to verify the axiom of continuity, by the remark after Theorem 2.

P CIl To simplify the notation and with no real loss of generality why? Given w? Note that if E E. Y' Bn E C n; and so forth by induction. Thus for each k Since Ck I w? The next theorem, which is a generalization of the extension theorem discussed in Theorem 2.

This extension is called the product measure of the sequence k'Pn, n 2: Thanks in advance for your time. Skip to content.

Search for books, journals or webpages All Webpages Books Journals. Kai Chung. Academic Press. Published Date: Page Count: Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle. Institutional Subscription. Free Shipping Free global shipping No minimum order. Undergraduate and graduate students studying mathematics.

English Copyright: Find an example of n r. Fields or B F 's. C0 of any family are said to be independent iff any collection of events, one from each g;, forms a set of independent events. Prove, however, that the conditions 1 and 2 are equivalent. Use Theorem 2. X is independent of itself if and only if it is constant with probability one. Can X and f X be independent where f E 2 31? Find the dJ. What is the precise relation? Modify Example 4 so that to each x in [0, I] there corresponds a sequence of independent and identically distributed r.

Jj EjEk ' 3. A typical application of Fubini' s theorem is as follows. Here and hereafter the tenn "convergence" will be used to mean convergence to a finite limit. Thus it makes sense to say: The limit is then a finite-valued r. The sequence of r. This type of trivial consideration makes it possible, when dealing with a countable set of r. The following characterization of convergence a.

Theorem 4. Suppose there is convergence a. For m Conversely, suppose 2 holds, then we see above that the set A E U:: A weaker concept of convergence is of basic importance in probability theory. Strictly speaking, the definition applies when all Xn and X are finitevalued. But we may extend it to r. Since 2' clearly implies 5 , we have the immediate consequence below. Convergence a. Sometimes we have to deal with questions of convergence when no limit is in evidence.

For convergence a. For convergence in pr. It can be shown Exercise 6 of Sec. If Xn converges to 0 in LP, then it converges to 0 in pr. Hence there is no loss of generality to assume X - O. This proves the first assertion. If now IXnl The general result is as follows. To show that p. In The reader should not be misled by these somewhat "artificial" examples to think that such counterexamples are rare or abnormal.

Natural examples abound in more advanced theory, such as that of stochastic processes, but they are not as simple as those discussed above. Here are some culled from later chapters, tersely stated. ThIS kind of example can be formulated for any recurrent process such as a Brownian motion.

The same holds fOl the martingale that consists in tossing a fail coin and doubling the stakes until the first head see Sec. Finally, we mention another kind of convergence which is basic in flJnctional analysis, but confine ourselves to L 1. Clearly convergence in Ll defined above implies weak convergence; hence the former is sometimes referred to as "strong".

On the other hand, Example 2 above shows that convergence a. Let I be a bounded uniformly continuous function in 0' 1. Let I be a continuous function on 0' 1. The result is false if [HINT: The extended-valued r.

Prove that X is bounded in pro if and only if it is finite a. The sequence of extended-valued r. Instead of the P in Theorem 4. Prove that these are metrics and that convergence in pro is equivalent to convergence according to either metric.

Convergence in pro for arbitrary r. In a space of uniformly bounded r. Unlike convergence in pr. If Xn"! Prove that for any bounded r. These notions can be defined for subsets of an arbitrary space Q.

The main properties of these sets will be given in the following two propositions. A point belongs to liminfn En if and only if it belongs to all terms of the sequence from a certain term on. Conversely, if w belongs to F m, then w E F m for every m. Were w to belong to only a finite number of the En's there would be an m such that w fj En for n 2: UE n Fm.

In more intuitive language: The advantage of such a notation is better shown if we consider, for example, the events "IXn I Using the notation in the preceding proof, it is clear that F m decreases as m increases. Hence by the monotone property of p. By Boole's inequality for p. As an illustration of the convenience of the new notions, we may restate Theorem 4. The intuitive content of condition 5 below is the point being stressed here.

XII o Briefly stated: Then we have by 4: Here the mdex mvolved in "i. J1l that belongs to infinitely many Bn's. For any sequence of r. Choose nk so that cf, Theorem 4. Use Theorem 4. Consider all pairs of rational numbers a. This is equivalent to:. IS the left Il Theorems 4. The first is more useful since the events there may be completely arbitrary. The second has an extension to pairwise independent r.

It is a useful technique in probability theory. What has been said so far is true for arbitrary En's. Now the hypothesis in 6 may be written as 00 I: Using Chebyshev's inequality, we 10 where a 2 ] denotes the variance of J.

Since ThIS IS an example of a "zero-or-one" law to be dIscussed though it is not included in any of the general results there. Let En be the event that a real number in [0, 1] has its n-ary expansion begin with 0. Prove that the probability of convergence of a sequence of independent r. IXnl hm sup - Strengthen Theorem 4.

Use 11 above. Is it true that limn J. The answer is no from trivial examples. Accordmg to our defirutlOn of a r. This example can be easily ramified; e. We leave this to the reader but proceed to give the appropriate definitions which take into account the two kinds of troubles discussed above. We shall see that it is unique below. Let ttn and tt be s. The following propositions are equivalent.

Finally, suppose iii is true so that 1 holds. The theorem is proved. As an immediate consequence, the vague limit is unique. Another consequence is: The case of strict probability measures will now be treated. Then i , ii , and iii in the preceding theorem are equivalent to the following "uniform" strengthening of i. Then there exist an integer. Recall that the latter is sequentially compact, which means: Given any sequence of numbers in the set, there is a subsequence which converges, and the limit is also a number in the set.

This is the fundamental Bolzano- Weierstrass theorem. The set of all s. It is often referred to as "Helly's extraction or selection principle". Given any sequence of s. Here it is convenient to consider the subdistribution function s. Fn defined as follows: To see this let rj be given. To F corresponds a unique s. L , xl as in Theorem 2. Now the relation 8 yields, upon taking differences: L, and the theorem is proved. L where fJ. The reader should be able to confirm the truth of the following proposition about real numbers.

In particular a bounded sequence such that every convergent subsequence has the same limit is convergent to this limit. The next theorem generalizes this result to vague convergence of s. It is not contained in the preceding proposition but can be reduced to it if we use the properties of vague convergence; see also Exercise 9 below.

If every vaguely convergent subsequence of the sequence v of s. L, then f.. To prove the theorem by contraposition, suppose f.. Ln does not converge vaguely to f.. Then by Theorem. L such that f.. Ln a, b does not converge to f.. By Theorem 4. L by hypothesis of the k theorem.

Hence again by Theorem 4. Perhaps the most logical approach to vague convergence is as follows. The definition given before implies this, of course, but prove the converse. Prove that if 1 is true, then there exists a dense set D', such that f.. Can a sequence of absolutely continuous p. Can a sequence of discrete p.

If a sequence of p. This is due to P6Iya. Renyi, [24]. Use 7 of Sec. Prove a convergence theorem in metric space that will include both Theorem 4. Use Exercise 9 of Sec. This has to do with classes of continuous functions on. It is well known that Co is the closure of C K with respect to uniform convergence.

An arbitrary function f defined on an arbitrary space is said to have support in a subset S of the space iff it vanishes outside S. This lemma becomes obvious as soon as its geometric meaning is grasped.

But let us remark that the lemma is also a particular case of the Stone-Weierstrass theorem see, e. Such a sledgehammer approach has its merit, as other kinds of approximation soon to be needed can also be subsumed under the same theorem. Indeed, the discussion in this section is meant in part to introduce some modem terminology to the relevant applications in probability theory.

We can now state the following alternative criterion for vague convergence. IIence by the linearity of integrals it is also true when f is any D-valued step function. The second term converges to zero as n 00 because f f is a D-valued step function. Conversely, suppose 2 is true for f E CT. Let A be the set of atoms of It as in the proof of Theorem 4. This must then be the same for every vaguely convergent subsequence, according to the hypothesis of the corollary.

The vague limit of every such sequence is therefore uniquely determined why? A similar estimate holds with fL replacing fLn above, by 7. Now the argument leading from 3 to 2 finishes the proof of 6 in the same way. Theorems 4. This is the sense of Example 2 in Sec. It is sometimes demanded that such a limit be a p. The following criterion is not deep, but applicable. Let a family of p. In order that every sequence of them contains a subsequence which converges vaguely to a p.

UEA Suppose 11 holds. We show that fL is a p. Let J be a continuity interval of M which contains the I in Then J C In for all sufficiently large 11, so that. The preceding theorem can be stated as follows: The word "relatively" purports that the limit need not belong to the family; the word "compact" is an abbreviation of "sequentially vaguely convergent to a strict p.

The new definition of vague convergence in Theorem 4. There is no substitute for "intervals" in such a space but the classes C K, Co and C B are readily available.

We will illustrate the general approach by indicating one more result in this direction. Usually j IS allowed to be extended-valued; but to avoid complications we will deal with bounded functions only and denote by L and V respectively the classes of bounded lower semicontinuous and bounded upper semicontinuous functions.

This gives the first inequality in A sequence of r. More briefly stated, convergence in pr. Since f is bounded the convergence holds also in L 1 by Theorem 4. For instance, 4. This is in contrast to the true convergence concepts discussed before; cf. Exercises 3 and 6 of Sec. But if Xn and Y n are independent, then the preceding assertion is indeed true as a property of the convergence of convolutions of distributions see Chapter 6.

However, in the simple situation of the next theorem no independence assumption is needed. The result is useful in dealing with limit distributions in the presence of nuisance terms. Exercise 4 below. Let f.. Ln and f.. Show that the conclusion in 2 need not hold if a f is bounded and Borel measurable and all f.. L are absolutely continuous, or b f is continuous except at one point and every f..

Ln is absolutely continuous. To find even sharper counterexamples would not be too easy, in view of Exercise 10 of Sec. L when the f.. Ln'S are s. Then for each each finite continuity interval I we have f df.. L be as in Exercise 1. If the f n's are bounded continuous functions converging uniformly to f, then f n df.. Give an example to show that convergence in dist. However, show that convergence to the unit mass oa does imply that in pr. Let the r. Prove the Corollary to Theorem 4.

If the r. Derive another proof of Theorem 4. The Levy distance of two s. Prove that this is indeed a metric in the space of s.

Find two sequences of p. L, where f.. If the [HINT: In general one may proceed by contradiction using an f that oscillates at infinity. Let Fn and F be dJ. Define Gn e and G e as in Exercise 4 of Sec. Do this first when F nand F are continuous and strictly increasing.

Indeed, we have seen in Example 2 of Sec. It is useful to have conditions to ensure the convergence of moments when Xn converges a. We begin with a standard theorem in this direction from classical analysis. The next result should be compared with Theorem 4. We prove the second assertion since the first is similar.

It is also an essential hypothesis in certain convergence questions arising in the theory of martingales to be treated in Chapter 9. Clearly 5 implies a.

Thus 5 implies b. Conversely, suppose that a and b are true. Thus 5 is true. Let In pr. Suppose ii is true, then we have iii by the second assertion of Theorem 4. Finally suppose iii is true. I for the constructIOn of such a functIOn. Hence we have where the inequalities follow from the shape of fA, while the limit relation m the mIddle as m the proof of Theorem 4. Subtractmg from the hmit relation in iii , we obtain - hm ] n This means: This establishes i , and completes the proof of the theorem.

In the remainder of this section the term "moment" will be restricted to a moment of positive integral order. It is well known sec Exercise 5 of Sec. Precisely, if FI and F2 are two dJ. The corresponding result is false in 9'[1,33 1 and a further condition on the moments is required to ensure uniqueness.

The sufficient condition due to Carleman is as follows: We shall not go into these questions here but shall content ourselves with the useful result below, which is often referred to as the "method of moments"; see also Theorem 6.

Suppose there IS a umque d. Let Mil be the p. We shall show that IL is indeed a p. Now for each r, let p be the next larger even integer. It follows from Theorem 4. But the left sIde also converges to m l by 8.

Hence by the umqueness hypothesis g is the p. Hence the theorem follows from Theorem 4. Exercise 3 may be reduced to Exercise 10 of Sec. Let F 11' 1 and the positive normal dJ. Random series Simple limit theorems The various concepts of Chapter 4 will be applied to the so-called "law of large numbers" - a famous name in the theory of probability.

This, of course, presupposes the finiteness of 0 Sn. A natural generalization is as follows: We shall present several stages of the development, 5. The simplest cases follow from Theorems 4. The idea then is to introduce certain assumptions to cause enough cancellation among the "mixed terms" in 3. A salient feature of probability theory and its applications is that such assumptions are not only permissible but realistic.

Two LV. The LV. The requirement of finite second moments seems unnecessary, but it does ensure the finiteness of l XY Cauchy-Schwarz inequality!

Finally, it is obvious that pairwise independence implies uncorrelatedness, provided second moments are finite. Thus 2 becomes applicable, and we have proved the following result. Theorem 5. This simple theorem is actually due to Chebyshev, who invented his famous inequalities for its proof.

The next result, due to Rajchman , strengthens the conclusion by proving convergence a. This result is interesting by virtue of its simplicity, and serves well to introduce an important method, that of taking subsequences Theorem 5. Under the same hypotheses as in Theorem 5.

We have by 6: In the present case we must show that Sk does not differ enough from the nearest Sn2 to make any real difference. Put for each The hypotheses of Theorems 5. The most celebrated, as well as the very first case of the strong law of large numbers, due to Borel , is formulated in terms of the so-called "normal numbers.

Fix a k: The number w is called simply normal to the scale 10 iff this limit exists for each k and is equal to Intuitively all ten possibilities should be equally likely for each digit of a number picked "at random".

As for determining whether certain well-known numbers such as e 2 or JT 3 are normal, the problem seems beyond the reach of our present capability for mathematics.

In spite of these difficulties, Borel's theorem below asserts that in a perfectly precise sense almost every number is nonnal. Furthennore, this striking proposition is merely a very particular case of Theorem 5.

Except for a Borel set of measure zero, every number in [0, 1] is simply normal. Consider the probability space 'fl, ill, m in Example 2 of Sec. Let Z be the subset of the fomi milO'!

Just as in Example 4 of Sec. Indeed according to Theorem 5. For a fixed k we define the 5. According to Theorem 5. The preceding theorem makes a deep impression at least on the older generation! If we use the intuitive language of probabIlIty such as com-tossmg, the result sounds almost tnte. A mathematician who is unacquainted with and therefore skeptical of probability theory tends to regard the last statement as either "obvious" or "unprovable", but he can scarcely question the authenticity of Borel's theorem about ordinary decimals.

As a matter of fact, the proof given above, essentially Borel's own, is a lot easier than a straightforward measure-theoretic version, deprived of the mtUItIve content [see, e. Oxford University Press, Inc. FOI any sequence of I. Without using Theorem 5.

Note that the full strength of independence is not needed. We may strengthen the definition of a normal number by considering blocks of digits. Let r Prove that for a. Reduce the problem to disjoint blocks, which are independent. The above definition may be further strengthened if we consider different scales of expansion. Prove that almost every number in [0, 1] is completely normal. Let a be completely normal. Show that by looking at the expansion of a in some scale we can rediscover the complete works of Shakespeare from end to end without a single misprint or interruption.

Let X be an arbitrary r. Prove that with probability one the fractional part of X is a normal number. Prove that the set of real numbers in [0, 1] whose decimal expansions do not contain the digit 2 is of measure zero. Is the sum of two normal numbers, modulo 1, normal? Is the product? Consider the differences between a fixed abnormal number and all normal numbers: In order to drop any assumption on the second moment, we need a new device, that of "equivalent sequences", due to Khintchine This means that there exists a null set N with the following property: L Xn w Yn w n consists of zeros from a certain point on.

Both assertions of the theorem are trivial consequences of this fact. In particular, if converges to X in pr. Hence if 1 n -;; L: The next law of large numbers is due to Khintchine.

Under the stronger hypothesis of total independence, it will be proved again by an entirely different method in Chapter 6. Now the Yn 's are also pairwise independent by Theorem 3. Let us calculate J2 T n ; we have by 6 of Sec.

For totally independent r v's, necessary and sufficient conditions for the weak law of large numbers in the most general formulation, due to Kolmogorov and Feller, are known. The sufficiency of the following criterion is easily proved, but we omit the proof of its necessity cf. Gnedenko and Kolmogorov [12]. Suppose that we have b2 n ]. Condition 7 may be written as: In general it ensures that none of the distribution is too far off center, and it is certainly satisfied if all F n are the same; see also Exercise 11 below.

It is clear why? As an application of Theorem 5. However, we can prove more. Petersburg paradox, in which you win 2n if it takes 11 tosses of a coin to obtain a head. What would you consider as a fair entry fee? Apply Theorem 5. Conditions i and ii in Theorem 5.