Task Description
n this request your task is to implement the PageRank algorithm in C using the power method described below and then optimise and parallelise your code to ensure peak performance is achieved.
The PageRank algorithm was developed in 1996 by Larry Page and Sergey Brin when they were graduate students at Stanford University. Google and other search engines compare words in search phrases to words in web pages and use ranking algorithms to determine the most relevant results.
PageRank assigns a score to a set of web pages that indicates their importance. The underlying idea behind PageRank is to model a user who is clicking on web pages and following links from one page to another. In this framework, important pages are those which have incoming links from many other pages, or have incoming links from other pages with a high PageRank score, or both.
You are encouraged to ask questions on Ed using the requests category. As with any request, make sure that your work is your own, and you do not share your code or solutions with other students.
The PageRank algorithm
PageRank is an iterative algorithm that is repeated until a stopping criteria is met. The last iteration gives us the result of the search, which is a score per web page. A high score indicates a very relevant web page whereas a low score indicates a not so relevant web page for a search. Sorting the web pages by their scores in descending order gives us the order for the result list of a search query.
Implementation details
Your program must implement the PageRank algorithm using the power method described above.
The header file pagerank.h contains an init function that processes the list of pages and edges from standard input and stores a linked list of pages and corresponding inlinks in the following structs
1 | struct page { |
Your program must produce no errors when built on the lab machines and ed with the command:1
clang -O1 -std=gnu11 -march=native -lm -pthread pagerank.c -o pagerank
Your program output must match the exact output format shown by the prototype and on Ed. You are encouraged to submit your request while you are working on it, so you can obtain some feedback.
Program input
1 | <dampening factor> |
The first line specifies the dampening factor to be used by the program. The second line defines the number of pages to be declared, followed by a list of page names with one name per line. Then, the number of edges in the graph is specified, followed by a list of edges with one edge per line.
The init function reads the input and terminates outputting an error message if the input is invalid.
The input is considered invalid if a page name exceeds the maximum length, the dampening factor is not in the range 0 < d < 1, a page is declared twice, or an edge is defined to a nonexistent page.