Το work with title Automatic summarization of phytopathologies using multimodal large language models by Fragkogiannis Georgios is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
Georgios Fragkogiannis, "Automatic summarization of phytopathologies using multimodal large language models", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.105007
Modern agriculture is called to respond to the increasing needs for food production, while at the same time maintaining its sustainability and efficiency. This challenge makes necessary the exploitation of advanced technologies, such as computer vision and modern large language models, especially in fields where timely and accurate diagnosis of phytopathologies can significantly reduce both production losses and the use of pesticides. In the present diploma thesis, we focus on the development of a system capable of recognizing diseases that affect plants, based on pictures of their leaves. The proposed architecture consists of three basic stages. Initially, a series of visual models is utilized for the detection of the leaf, the identification of areas with signs of infection, and the classification of the possible disease. In the next stage, a tool calling system allows a language model to select and activate the appropriate tools, enriching the diagnostic process. Finally, the system composes a documented and complete answer, in which both the recognized disease and the proposed ways of addressing it are presented. During the experimental process, YOLO models were used for the detection and classification of images, which were trained on specially configured datasets, based on recognized public datasets in the field of smart agriculture, such as PlantVillage and PlantDoc. At the same time, Retrieval-Augmented Generation (RAG) systems were developed for the retrieval of information regarding methods for dealing with the diseases, while fine-tuning was also carried out on vision-language models to enhance accuracy in the recognition of phytopathologies.