AI代写:CS3049-Likelihood-Weighted-Sampling

Probabilistic graphical models

Your task is to write a program to implement likelihood weighted sampling, as described in lecture 18, to perform inference on an arbitrary probabilistic graphical model (PGM) of boolean random variables. You will need to parse an input text file that encodes the graph and the conditional probability distributions for each variable (conditioned on its parents). The format of the file is given in the section below. Two networks have been defined in this format and can be downloaded, along with two queries each from the forum, in file ass3.zip.

Your program must be able to parse any file of this format to create and populate an internal data structure of the PGM.
Your program must then prompt the user via the console (or read from redirected standard input) for a single variable query, parse the query correctly and evaluate the conditional probability distribution of the query variable, given the evidence. The result must be written as two decimal values (corresponding to the values of P(QueryVar=true|…) and P(QueryVar=false|…)) onto the standard output stream, separated by white space. For example:

1
0.872 0.128

Note that if you write anything else before these two numbers, automark will interpret that output as the answer, almost certainly resulting in a test failure.

File format

The graphical model will be specified by a text file with format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
N

rv0 rv1 ... rvN-1

0 0 1 ... 0
1 0 0 ... 1
...
0 1 1 ... 0
mat0

mat1

...

matN-1

Here:

  • N is the number of random variables in the network;
  • rv are the random variable names (arbitrary alphanumeric strings);
  • mat are two dimensional arrays of real numbers (in ASCII) that specify the conditional probability table of each random variable conditioned on its parents;
  • The matrix of zeros and ones specifes the directed arcs in the graph; a one (1) in the i,j entry indicates there is an edge from i to j (so the ith variable is a parent of the jth variable.

The format of the Conditional Probability Table (CPT) matrices is a bit subtle. If a node has m parents, then the matrix needs to specify the probability of each outcome (true, false) conditioned on 2^m different combinations of parent values, so the matrix will be 2^m × 2 (rows × columns). Treating true as 1, and false as 0, concatenate the values of the parents in their numerical order from most significant bit to least significant bit (left to right) to create a row index r. The entry in the first column, rth row is then the probability that the variable is true given the values of the other variables (the entry in the corresponding 2nd column is the probability that the variable is false). Thus, the first row of the matrix corresponds to all conditioning variables taken the value false (r = 000 … 0), and the last row has all conditioning variables true (r = 111 … 1).

Deliverables

Write your program in Java or C/C++. In the case of Java, name your program inference.java. Your program must be able to be compiled and run as follows:

1
2
$ javac inference.java
$ java inference graphfile.txt

In the case of C/C++, you must supply a makefile Makefile with a rule called inference to compile your program into a Linux executable binary named inference.bin. Your program must be able to be compiled and run as follows:

1
2
$ make inference
$ ./inference.bin graphfile.txt

NOTE1: If a makefile exists, the marking script will assume that the program is written in C/C++, otherwise Java will be assumed.
NOTE2: graphfile.txt is of course a placeholder for the name of the file containing the PGM structure.

Submission and assessment

You must submit your program on the Computer Science Web Submission System. This means you must create the request under your own SVN repository to store the submission files.

Your code will be compiled and run, testing its outputs for the two networks with two queries each that have been provided to you, and an additional three networks with two queries each. None of the networks will contain more than 16 variables. Since the results are stochastic, the query answers will not be exact and will vary from run to run. The asnwers will therefore be tested against a tolerance of ±0.01 (i.e. your answers must be within 1% of the true values), so you must ensure convergence to this level of precision. If it passes all tests you will get 15% of the overall course mark. The objective of the tests is to check for the correct operation of your implementation. Hence, the basis of the assessment is to compare your results against the expected results. You must also ensure that you have an efficient implementation.

Using other source code

You may not use other source code for this request. You should personally and carefully implement the likelihood weighted sampling to fully understand the concept.