Exploring the Use of Large Language Models for Lossless Text Compression

Mechalkh, Charaf Eddine; Fennouh, Marya Douniazad

Please use this identifier to cite or link to this item: https://dspace.univ-ouargla.dz/jspui/handle/123456789/40056

Title:	Exploring the Use of Large Language Models for Lossless Text Compression
Authors:	Mechalkh, Charaf Eddine Fennouh, Marya Douniazad
Keywords:	Large Language Models LLMs Context-Aware Text Compression Com- pression
Issue Date:	2025
Publisher:	UNIVERSITY OF KASDI MERBAH OUARGLA
Citation:	FACULTY OF NEW TECHNOLOGIES OF INFORMATION AND COMMUNICATION
Abstract:	The rapid growth in data generation has led to an increasing demand for eﬃcient data compression techniques. Traditional compression methods, such as Huﬀman coding, LZ- based algorithms, and arithmetic coding, have proven eﬀective in reducing ﬁle sizes. However, these techniques often fail to account for the contextual nature of data, which can limit their performance when handling complex, variable-length content such as text, images, or multi- modal data. In recent years, Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating human-like text, making them a promising candidate for enhancing compression techniques through context-awareness. LLMs, with their ability to process large amounts of sequential data and recognize pat- terns, oﬀer signiﬁcant potential in improving compression by leveraging context in a more dynamic and adaptive manner. Unlike traditional methods that rely on ﬁxed algorithms, LLM-based compression could adjust to the content being compressed, leading to more eﬃ- cient encoding and potentially higher compression ratios. This thesis explores the potential of LLMs in context-aware compression. We investigate how LLMs, speciﬁcally GPT-like models, can be integrated into compression pipelines to op- timize encoding strategies based on the context within the data. Our objectives are to assess the advantages of LLM-enhanced compression methods compared to traditional techniques and demonstrate how context-awareness can lead to more eﬃcient compression, particularly in complex or varied datasets. The results of our study show that LLM-based approaches can outperform traditional methods in certain scenarios, oﬀering promising avenues for fu- ture research and practical applications in data compression.
Description:	Artiﬁcial Intelligence and Data Science
URI:	https://dspace.univ-ouargla.dz/jspui/handle/123456789/40056
Appears in Collections:	Département d'informatique et technologie de l'information - Master

Files in This Item:

File	Description	Size	Format
FENNOUH.pdf	Artiﬁcial Intelligence and Data Science	896,24 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets