Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning
Xi Xuan1,2 Wenxin Zhang3 Zhiyu Li4  Jennifer Williams5 Ville Hautamäki1 Tomi Kinnunen1
1University of Eastern Finland  2City University of Hong Kong  3University of Chinese Academy of Sciences
4University of Science and Technology of China  5University of Southampton
§ 1

The Speech Deepfake Source Verification Challenge


Utterance A
Mel A
vs
Similarity Score
τ = 0.50
0
Diff
1
Same
0.83
● Same Source
Utterance B
Mel B
Ground Truth ✓ Same Source
Both utterances from the same TTS system, same target speaker.

Scores from RiemanSD-AAM (ResNet34). Audio from MLAAD v8.

§ 2

Model Comparison on Hard Case

Same speaker, different TTS systems


AAM-Softmax
Baseline
0.67 — above thresholdPredicts "Same Source"
✗ Wrong
ChebySD-AAM
Ours
0.43 — below thresholdPredicts "Different Source"
✓ Correct
EER ↓ 1.39% (P-III)
RiemanSD-AAM
Ours · Best
0.31 — clearly below thresholdPredicts "Different Source"
✓ Correct
EER ↓ 3.16% (P-III)
§ 3

Embedding Visualization


t-SNE
Tacotron2-DDC
VITS-neon
suno/bark
XTTS-v2
MeloTTS
FastPitch
Distinct, compact clusters per TTS system confirm effective source separation.
§ 4

Evaluation Protocols

Four protocols × two axes: source visibility (seen/unseen) and speaker condition (same/different). 27,530 utterances each, 1:1 balanced. EER/AUC from ResNet34 + RiemanSD-AAM (Table 2).


P-I
Seen Source — Same Speaker
0.68%EER
0.998AUC
Sample pair
tacotron2-DDC_ph
Same speaker
tacotron2-DDC_ph
Same speaker
P-II
Seen Source — Different Speaker
1.21%EER
0.996AUC
Sample pair
VITS-neon
Speaker A
VITS-neon
Speaker B
P-III · Hard
Unseen Source — Same Speaker
4.08%EER
0.988AUC
Hard pair — diff source, same voice
overflow
Same speaker
VITS
Same speaker
P-IV · Hard
Unseen Source — Different Speaker
7.13%EER
0.972AUC
Hard pair — same source, diff voices
parler_tts_mini_v1
Speaker A
parler_tts_mini_v1
Speaker B
§ 6

Citation


@inproceedings{xuan2026sdml, title = {Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning}, author = {Xuan, Xi and Zhang, Wenxin and Li, Zhiyu and Williams, Jennifer and Hautam{\"a}ki, Ville and Kinnunen, Tomi}, year = {2026}, url = {https://github.com/xxuan-acoustics/RiemannSD-Net}, }