1: (Un)Determinism
The theory of dynamic systems allows us to determine the motions of objects under deterministic actions. Newton's development of analytical mechanics showed us that past and future can be predicted systematically from the positions and velocities of particles at any one point, in tandem with infinitisimal motions in any one direction. It is here that the theory of differential equations and dynamical systems takes over. In these notes, we take a different approach than Newton followed. In reality, one can never measure the data precisely enough to determine the state of a system. Inexactness thus shrouds the determinism of a system, which invalidates Newton's model when applied to real-life experiments. Stochastic processes models the uncertainty of observations in a physical system.
We can think of a stochastic process as an experiment, which generates a range of data. For each particular experiment (or observation), we pick a particular $\omega \in \Omega$, and observe the resulting trajectory of data $\{ X_i(\omega) : i \in T \}$. The fact that stochastic processes can model any observation of this variety means they represent a great many phenomena, with many examples of applications.
As a naive model of the uncertainty of weather, we may take a stochastic process with state space $\mathcal{S} = \{ \textbf{sunny}, \textbf{rainy} \}$. For $i \in \mathbf{Z}$, we may model the weather on a certain day $i$ by a random variable $X_i : \Omega \to \mathcal{S}$. Then $\{ X_i : i \in \mathbf{Z} \}$ is a stochastic process. Of course, real models of the weather use much more complicated state spaces and time, but most if not all models will use a stochastic process, because weather is a chaotic system. Even if we use Newtonian mechanics as an approximation of the weather system, we cannot determine all physical variables which may effect the future, and even the smallest uncertainty will drastically effect the state of the future (you may know this as 'the butterfly effect').
In finance, we often want to model the stock market as an uncertain process, because as with weather it is impossible to measure every variable which will determine future stocks (and how would we model a person's internal decision to buy and sell stocks in the future?). We take $\mathcal{S} = \mathbf{R}$, and let $X_i$ be the value of a certain stock at time $i$, for $i \in \mathbf{R}$. This is a continuous time random process, into a continuous state space. We will treat continuous stochastic processes like this at a later time, and discover that modelling the stock market can be effectively generalized as the study of Brownian motion, which also describes the way that atoms move.
In non-parametric statistics, to estimate the CDF of an independant and identically distributed sample $X_1, \dots, X_n \sim F$ of a distribution over the real numbers, we take the estimate \[ \hat{F}(t) = \sum_{i = 1}^n \frac{\mathbf{I}[X_i \leq t]}{n} \] Then, for a fixed $t$, $\hat{F}(t)$ is a random variable, determining a stochastic process, ordered over $t$. Thus the study of this estimate reduces to a problem in stochastic processes, and in fact, most results about this distribution use the ideas of our current approach in disguise.
The examples above show both the boon and deficiency of our approach. On one hand, a vast number of problems can be covered by stochastic processes. On the other hand, this means that when we study stochastic processes in general we are studying far too many things to make meaniningful and deep statements. We will discover that the real fun of stochastic processes begins when we add additional assumptions about our data, and discover the resulting properties.
We have shown that using stochastic processes is a very general notion of repeatedly observing data. This relies on the fact that it is easy to specify some process by an observation of some data which may have some random component. Most constructions of stochastic processes rely on observing a finite portion of the data, and then extrapolating to form the entire process. It is fortunate that mathematics has an existence theorem showing that, provided your finite data is consistent, a stochastic process can always be built to satisfy your needs.
To describe the theorem elegantly, we shall introduce some temporary terminology. Fix some state space $\mathcal{S} \subset \mathbf{R}$, and index set $T$, in which our stochastic process will be generated. Suppose that each finite subset $A$ of $T$ determines a probability distribution $\mathbf{P}_A$ on the Borel $\sigma$-algebra of $\mathcal{S}^A$. If $B \subset A \subset T$, then we have an imbedding of the borel sigma algebras defined by $$ \pi_{B \to A}: \mathcal{B}(S^B) \to \mathcal{B}(S^A) $$ $$ \pi_{B \to A} (X) = \{ x \in \mathcal{S}^A : \{ x_i : i \in B \} \in X\ \text{and}\ x_j = \mathcal{S}\ \text{for}\ j \not \in B \} $$ In other words, we just take $\mathcal{S}$ as the fiber on coordinates not in $B$. A family of finite dimensional distributions $\{ \mathbf{P}_S : S \subset T \}$ is consistant if $\mathbf{P}_A \circ \pi_{B \to A} = \mathbf{P}_B$, for any $B \subset A \subset T$. It was Kolmogorov's discovery that this is all we need to define a stochastic process.
The proof uses the Hahn-Kolmogorov/Carathéodory extension theorem to construct a probability measure on $\mathcal{S}^T$, which can then be taken as the sample space of our random variables $X_i = \pi_i$. We leave the technical details to the reader. The proof given should (in principle) extend naturally to any Polish (separable and completely metrizable) space, but this is not needed here. The random variables specified are not unique, but we shall not worry about this until later on, when we analyze continuous time markov processes, when this has more important repurcussions.
Now that we have introduced the characters of study, we shall begin building our intuitions on the simplest stochastic processes for which interesting results can be obtained: markov chains.