Georgios Pantazis, "Production line control using deep reinforcement learning techniques", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.103797
Modern production lines are facing challenges, when it comes to maintaining efficiency. These challenges are due to various reasons, including unpredictable machine failures, fluctuating demand, and buffer congestion. These issues result in increased costs and delays, that traditional scheduling methods struggle to handle. As manufacturing systems become more complex, there is a growing need for adaptive control strategies that can optimize production in real-time. In this diploma thesis, we present a deep reinforcement learning approach to optimizing production line control, leveraging SimEvents in Simulink and the Matlab Reinforcement Learning Toolbox. The production system consists of sequential machine stages, buffers, and an assembly mechanism, where inefficiencies, such as buffer congestion and machine downtime, lead to increased operational costs. A Proximal Policy Optimization (PPO)-based reinforcement learning agent was designed to handle machine operations by monitoring key system variables, including buffer levels and machine states. The agent is designed to minimize delays and prevent buffer accumulation, enabling it to learn adaptive scheduling policies that improve overall efficiency and minimize cost. Through a series of simulations, our approach proves to be more effective than traditional optimization methods and yields solutions very close to optimal solutions, which can be obtained for small problem instances by dynamic programming methods. The results of extensive experimental testing show that reinforcement learning successfully improves manufacturing efficiency through better decision-making, pointing to new ways of running production systems.