An optimal policy for partially observable Markov decision processes with non-independent monitors
Purpose - This research investigated the optimal structure of a discrete-time Markov deterioration system monitored by multiple non-independent monitors. The purpose is to obtain a sufficient condition with which the optimal policy is given by a control limit policy. Design/methodology/approach - The model of this research is formulated as a partially observable Markov decision process. The problem is to obtain an optimal policy which can minimize the expected total discounted cost over an infinite horizon. Findings - The research found that the expected optimal cost function over an infinite horizon has a property of control limit policy given the conditions that a transition probability having a property of totally positive of order 2 and a conditional probability of the monitors having a property of weak multivariate monotone likelihood ratio. Furthermore, we showed that the optimal policy has only four action regions at most. Practical implications - If the optimum policy can be limited to a control limit policy, the tremendous amount of calculation time required to find the optimum procedure can be reduced. This enables the best decision to be identified in a much shorter period of time. Originality/value - A deterioration system monitored incompletely by one monitor has been studied in the previous research. This research considered the case of a multiple number monitors whose observations were not independent.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media