To meet the high-performance and reliability demands of 5G, the Radio Access Network (RAN) is moving to a cloud-native architecture. The new microservice architecture promises increased operational efficiency and a shorter time-to market, but it also comes with a price. The new distributed and virtualized architecture is far more complex than ever before, and with the increasing number of features it brings, troubleshooting becomes more difficult. So far, RAN troubleshooters have relied on their expertise to analyze systems manually, but the ever-growing data and increased complexity make it challenging to grasp system behavior. This thesis contributes threefold, where the proposed machine learning and statistical methods help RAN troubleshooters find deviations in system logs, identify the root cause of these deviations, and improve the system’s observability. These methods learn the application’s behavior from the system logs events and can identify behavior deviations from many different aspects. The thesis also demonstrates how observability can be improved by using a new software instrumentation guideline. The guideline enables the tracking of systemized procedures and enhances system understanding. The purpose of the guideline is to make RAN developers aware that machine learning can utilize debug information and help their troubleshooting process. To familiarize the reader with the research area, the challenges, and methods that can be used to detect anomalies, perform root cause analysis and observe RAN system behavior. The proposed research methods are integrated and tested in an advanced 5G test bed to evaluate the methods’ accuracy, speed, system impact, and implementation cost. The results demonstrate the advantage of using machine learning and statistical methods when troubleshooting the behavior of RAN. Machine learning methods, similar to those presented in this thesis, may help those who troubleshoot RAN and accelerate the development of 5G. The thesis ends with presenting potential research areas where this research could be further developed and applied, both in RAN and other systems.
Page Responsible: Frank Drewes 2024-11-21