Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis which have the potential of assisting software teams working on large codebases. Existing approaches aim at computing similarity between two codes by suitable representation of them which captures syntactic and semantic properties. However, they lack explainability and generalization for multiple languages comparison. Here, we present a preliminary result that attempts at providing a graph-focused representation of code by means of which clustering and classification of programs is possible while exposing explainability and generalizability characteristics.

A Machine Learning Approach for Source Code Similarity via Graph-Focused Features / Boldini, G.; Diana, A.; Arceri, V.; Bonnici, V.; Bagnara, R.. - 14505 LNCS:(2024), pp. 53-67. (Intervento presentato al convegno LOD2023 - The 9th International Conference on Machine Learning, Optimization, and Data Science tenutosi a Grasmere, Lake District, England – UK nel 10/2023) [10.1007/978-3-031-53969-5_5].

A Machine Learning Approach for Source Code Similarity via Graph-Focused Features

Diana A.;Arceri V.;Bonnici V.;Bagnara R.
2024-01-01

Abstract

Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis which have the potential of assisting software teams working on large codebases. Existing approaches aim at computing similarity between two codes by suitable representation of them which captures syntactic and semantic properties. However, they lack explainability and generalization for multiple languages comparison. Here, we present a preliminary result that attempts at providing a graph-focused representation of code by means of which clustering and classification of programs is possible while exposing explainability and generalizability characteristics.
2024
9783031539688
9783031539695
A Machine Learning Approach for Source Code Similarity via Graph-Focused Features / Boldini, G.; Diana, A.; Arceri, V.; Bonnici, V.; Bagnara, R.. - 14505 LNCS:(2024), pp. 53-67. (Intervento presentato al convegno LOD2023 - The 9th International Conference on Machine Learning, Optimization, and Data Science tenutosi a Grasmere, Lake District, England – UK nel 10/2023) [10.1007/978-3-031-53969-5_5].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2988653
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact