When modeling the many different forms of multi-agent multi-agent systems, a significant amount of Markov decision processes, also called MMDPS in some contexts, are used. In this piece of writing, two innovative algorithms that are based on learning automata are proposed as a way to solve MMDPS and decide the policies that will be most effective overall. In many of the techniques that have been developed, the Markov problem is given in the form of a directed graph. The problem can be thought of as existing in different states, each of which is represented by a node in this graph; Actions that lead to transitions from one state to another are represented by directed edges that connect nodes in the network. Each node in the network is equipped with a learning automaton that is in charge of the node's tasks, which are represented by the node's outgoing edges. Each agent moves from one node to another during its journey as they work towards the goal of arriving at the goal state. When the agent is deciding which path to follow next, the learning automation at each node provides guidance so that it can make an informed choice. Activities performed by learning automata along the path traveled by the agent are then rewarded or punished according to the learning algorithm based on the cost of the path traveled by the agent.
This work is licensed under a Creative Commons Attribution 4.0 International License.
You may also start an advanced similarity search for this article.