Design and Optimization of a Privacy-Preserving Speaker Identification System Using Pre-Trained Deep Learning Embeddings and Cancelable Biometrics

Khaldi, Belal; BOUSNINA, IMANE

Please use this identifier to cite or link to this item: https://dspace.univ-ouargla.dz/jspui/handle/123456789/40052

Full metadata record

DC Field	Value	Language
dc.contributor.author	Khaldi, Belal	-
dc.contributor.author	BOUSNINA, IMANE	-
dc.date.accessioned	2026-01-26T11:09:32Z	-
dc.date.available	2026-01-26T11:09:32Z	-
dc.date.issued	2025	-
dc.identifier.citation	FACULTY OF NEW TECHNOLOGIES OF INFORMATION AND COMMUNICATION	en_US
dc.identifier.uri	https://dspace.univ-ouargla.dz/jspui/handle/123456789/40052	-
dc.description	Artificial Intelligence and Data Science	en_US
dc.description.abstract	Biometric authentication is increasingly adopted for secure system access, replacing tradi- tional passwords with unique physiological or behavioral traits, yet raw biometrics risk perma- nent identity theft due to their unchangeable nature if compromised. Cancelable biometrics mitigate this by transforming traits into revocable, non-invertible templates, thereby enhancing security and privacy. This thesis develops a privacy-preserving speaker identification system that integrates Mel-spectrogram preprocessing to convert audio into frequency-time representa- tions, pre-trained SpeechBrain ECAPA-TDNN embeddings to extract 192-dimensional feature vectors, and a Support Vector Machine (SVM) classifier with a linear kernel (C=10.0) for user classification. Privacy is fortified through Double Random Phase Encoding (DRPE), which encrypts speaker templates using two random phase masks applied in spatial and frequency domains via Fast Fourier Transform (FFT) and inverse FFT operations. The methodology encompasses metadata extraction from a 240-user dataset, audio resampling to 16,000 Hz, embedding generation and validation, SVM training with 7200 train and 3120 test samples, template creation by averaging embeddings, DRPE encryption, and identity verification using cosine similarity scores computed on encrypted pairs. Analysis, supported by confusion ma- trices and Receiver Operating Characteristic (ROC) curves, highlights challenges such as voice similarity between users, achieving a preliminary accuracy of approximately 99% and ongoing threshold optimization. Distinct from conventional methods, this system leverages pre-trained deep learning embeddings with cancelable biometrics, balancing security and reliability. Future work will refine encryption parameters, address user-specific variability, and explore deep learn- ing enhancements for spectrogram processing.	en_US
dc.description.sponsorship	Department of Computer Science and Information Technology	en_US
dc.language.iso	en	en_US
dc.publisher	UNIVERSITY OF KASDI MERBAH OUARGLA	en_US
dc.subject	Speaker identification	en_US
dc.subject	cancelable biometrics	en_US
dc.subject	pre-trained deep learning	en_US
dc.subject	ECAPA- TDNN	en_US
dc.subject	DRPE	en_US
dc.title	Design and Optimization of a Privacy-Preserving Speaker Identification System Using Pre-Trained Deep Learning Embeddings and Cancelable Biometrics	en_US
dc.type	Thesis	en_US
Appears in Collections:	Département d'informatique et technologie de l'information - Master

Files in This Item:

File	Description	Size	Format
BOUSNINA.pdf	Artificial Intelligence and Data Science	1,09 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets