Requirement
Key to any retail shop, from Amazon to K-Mart to Coles, is its inventory management system - that is a system to decide what items to stock and how many should the stock be. In this request, we will see an application of Markov Decision Processes (MDPs) in such a system.
The problem
You have been hired to setup the inventory management system for Retail (R). R sells various types of electronics goods, designed and developed by staffs and students. R has multiple retail stores of various sizes, at different locations. Your task is to create a system that uses historical customer behaviour data from a store, to decide what type of items to order/return and how many, so that the store makes the most profit. The rest of this document describes the detail of the problem.
Each retail store of R can be classified into one of five different classes of store, based on their capacities. The 5 types of stores, along with their stocking capacities and maximum ordering capabilities are:
Tiny store. This store sells up to 2 types of items and can stock at most 3 items. It can order at most 2 items per week and can return at most 1 item per week.
Small store. This store sells up to 2 types of items and can stock up to 8 items. It can order at most 3 items per week and can return at most 2 items per week.
Medium store. This store sells up to 3 types of items and can stock up to 8 items. It can order at most 3 items per week and can return at most 2 items per week.
Large store. This store sells up to 5 types of items and can stock up to 10 items. It can order at most 4 items per week and can return at most 2 items per week.
Mega store. This store sells up to 7 types of items and can stock up to 20 items. It can order at most 5 items per week and can return at most 3 items per week.
Each store can order items at no cost. Whenever an item is sold, the store keeps 75% of the payment and passes the rest to R headquarter. For each miss opportunity (i.e., failing to provide an item requested by a customer), the store is fined 25% of the price of the requested item by R headquarter. When a store returns an item, it must pay half of the price of the item to R headquarter.
To simplify the problem, R assumes:
Stocking is done while the store is closed (Saturday-Sunday). Each store can only order the stocks once per week, which is on Saturday morning. The stock will be delivered and arrange on the shelves during the weekend, before it opens again for next week’s operation.
All items ordered will be delivered in good condition, on time. Also, the quality of the goods will not degrade over time.
When the order/return operation causes the store to have more items than its capacity, R will automatically cut the store’s order prior to delivery, so that there will be no excess items in the store. Assuming the types of items are indexed from 1, the items will be cut ascendingly, from items with the lowest index first. This cut will cost the store a penalty fee of $F per item cut.The buying habit of the customers are independent between one another. Also, the buying habit of the customers of one store is independent from those of other stores. However, for each type of items in the store, the amount of items the customer buys do depend on the amount that is available in the store right after the most recent replenishment.
Within the week, the total number of items per type of goods that the customers of a store want to buy is at most equivalent to the total number of items that the store can stock.
The performance of the store is measured weekly, right after the store is closed for the week.
The inventory system should assume that the store will remain open forever.
Although for simplicity in computing the actual profit gained, the system will be tested on a finite number of weeks.
Given the stochastic model of the customers’ behavior, the type of store, and information about the available items in the store immediately before the shopping order is made, the inventory management system should decide which types of items and how many should be ordered and returned, so that the store gains the maximum profit possible. Profit is defined as the income the store gets (i.e., 75% of the total payment received from customer) minus the sum of the cost of returning the items, the fines for miss opportunities, and the penalty fees for having items being cut from the order list by R headquarter.
What you need to do
Your tasks in this request can be classified into 4 parts:
Design the solution. This task contains two main components, i.e., framing the problem as an MDP problem and deciding how to solve the problem. Note that you should design the MDP components, i.e., what are the states, actions, etc., manually. However, the exact parameters of your MDP problem will depend on the given input file. Please be aware that how you define your MDP problem might help or hurt your ability to solve the problem.
Implement your design. Your program is allowed a maximum of 3 minutes computation prior to each simulation run. If you use an online method for solving MDP, at each step during run-time, your program is allowed a maximum of 30 seconds computation time for a tiny/small/medium store and a maximum of 1 minute computation time for a large/mega store. Your program will be run on a PC with the same specification as those in the tutorial rooms. The time requirement is for a program that runs as a single-threaded process. If you use multi-threading, then we will divide the aforementioned time limit with the number of threads you use. Note: You are not allowed to use any library for linear algebra, optimization, and MDP solver.
Implement the basic value iteration to compare your design with the baseline (at least for the tiny and small stores).
Write a report. you only need to answer the questions we post for the Report (see the last part of this document). However, to answer these questions well, you do need to do #3.
Input and Output format
Input format. The input is a single .txt file, containing all the information about the store, the types and price of the items that can be sold at the store, and the store’s customers’ behaviour.
Output format
Your program should output the state, the order list, and the return list at each week.