In this work we propose an approach to improve the performance of a current methodology, computing k-mer based sequence similarity via Jaccard index, for pangenomic analyses. Recent studies have shown a good performance of such a measure for retrieving homology among genetic sequences belonging to a group of genomes. Our improvement is obtained by exploiting a suitable k-mer representation, which enables a fast and memory-cheap computation of sequence similarity. Experimental results on genomes of living organisms of different species give an evidence that a state of the art methodology is here improved, in terms of running time and memory requirements.
A k-mer Based Sequence Similarity for Pangenomic Analyses / Bonnici, V.; Cracco, A.; Franco, G.. - 13164:(2022), pp. 31-44. (Intervento presentato al convegno LOD2021 - The 7th International Online & Onsite Conference on Machine Learning, Optimization, and Data Science tenutosi a Grasmere, Lake District, England – UK nel 10/2021) [10.1007/978-3-030-95470-3_3].
A k-mer Based Sequence Similarity for Pangenomic Analyses
Bonnici V.
;
2022-01-01
Abstract
In this work we propose an approach to improve the performance of a current methodology, computing k-mer based sequence similarity via Jaccard index, for pangenomic analyses. Recent studies have shown a good performance of such a measure for retrieving homology among genetic sequences belonging to a group of genomes. Our improvement is obtained by exploiting a suitable k-mer representation, which enables a fast and memory-cheap computation of sequence similarity. Experimental results on genomes of living organisms of different species give an evidence that a state of the art methodology is here improved, in terms of running time and memory requirements.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.