Here we include different kinds of models that are based on probabilities, also known as stochastic models. They can either be analyzed analytically, or be simulated with a stochastic simulation. Generally, first try to understand the problem intuitively. Draw graphs and make tables with given data. Then be systematic: variables, equations, calculations,...
Every group will participate in a final meeting to discuss final reports and
grading. Meetings will take place on Wednesday October 30 (morning and
afternoon), Thursday October 31 (morning and afternoon) and Friday November 1
(morning). Clearly state when you are available, so that we can schedule your
group appropriately. Please answer this question even if you do not have any
constraints, so that we know this for sure.
Please contact Birgit if none of the times fit you.
A classical application of statistical models is simulation of systems where things happen randomly based on certain probability ditributions that are chosen to be as realistic as possible. This is called Monte Carlo simulation. See for example these simple traffic simulation demos: demo1, demo2, demo3, wikipedia (there are lots out there). I intentionally chose some simple demos, so that you can more easily relate to actually creating them yourself. As in many other areas these days, there is a lot of sophisticated software with advanced models and beautiful graphics, where you almost forget that there is the same basic mathematical techniques behind. Briefly discuss the differences in the kind of predictions you can draw from stochastic models, compared to when have a deterministic model (like for example an astronomical model of planetary motion).
A rental car company has two offices in the cities A and B. Customers are free to leave the cars in any city, independently of where they were rented. Based on collected statistics it is known that a car rented in A is returned there with probability 0.6, and a car rented in B is returned there with probability 0.7.
Try out this program that generates text as a Markov chain based on statistics from a template text. The idea is that the program generates the next letter randomly, given the k previous letters, from the distribution p(xn|xn-1,...,xn-k) estimated from the template text (Note that this is not the same as the most probable sequence!) The program works better with larger inputs, so copy and paste some text from somewhere. Try the program for different input texts (e.g. try English and Swedish), and different values of "Order" in the interval 0-5. Discuss your observations (see this as an investigation, the result is where your observations take you!). If you want to see the source code of the generator, it is available here: https://github.com/hay/markov/blob/master/markov.php.
A public screening is done of a group of people to find the persons who have the disease X. This is done with a medical test. As with most medical tests, the test is not 100% reliable. It gives a correct result with a probability of 99% if the person has the disease, and with 97% if the person does not have the disease. Prior to the screening, it has been estimated that about 0.33% of the population have the disease. (Note: This means that 0.0033 of the population have the disease, not 0.33!) For a particular person the test has indicated a positive result. What is the probability that the person actually has the disease? Hint: Begin by writing down in mathematical notation what you know from the start. Try also to think what would happen with very extreme or symmetric numbers to explore and understand the problem (this is a good general trick).
In statistical expert systems knowledge is represented with a probability distribution over all the variables. The probability distribution is defined by a Markov graph in the form of cause-effect relations (sometimes called probabilistic/belief/bayesian networks). Given the structure of the graph, the parameters can be estimated from statistical data, and inferences and predictions can be made according to the laws of probability by using Bayes theorem. This problem goes more deeply into the "Asia" example of a Bayesian network that was discussed in the introductory lecture.
Your task is to predict the probability of precipitation ("nederbörd") on a given future day. To help you, you have weather statistics from the last five years. Suppose you want to predict if there will be any precipitation on May 19. Should you base your prediction on: