Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis which have the potential of assisting software teams working on large codebases. Existing approaches aim at computing similarity between two codes by suitable representation of them which captures syntactic and semantic properties. However, they lack explainability and generalization for multiple languages comparison. Here, we present a preliminary result that attempts at providing a graph-focused representation of code by means of which clustering and classification of programs is possible while exposing explainability and generalizability characteristics.
A Machine Learning Approach for Source Code Similarity via Graph-Focused Features / Boldini, G.; Diana, A.; Arceri, V.; Bonnici, V.; Bagnara, R.. - 14505 LNCS:(2024), pp. 53-67. (Intervento presentato al convegno LOD2023 - The 9th International Conference on Machine Learning, Optimization, and Data Science tenutosi a Grasmere, Lake District, England – UK nel 10/2023) [10.1007/978-3-031-53969-5_5].
A Machine Learning Approach for Source Code Similarity via Graph-Focused Features
Diana A.;Arceri V.;Bonnici V.;Bagnara R.
2024-01-01
Abstract
Source code similarity aims at recognizing common characteristics between two different codes by means of their components. It plays a significant role in many activities regarding software development and analysis which have the potential of assisting software teams working on large codebases. Existing approaches aim at computing similarity between two codes by suitable representation of them which captures syntactic and semantic properties. However, they lack explainability and generalization for multiple languages comparison. Here, we present a preliminary result that attempts at providing a graph-focused representation of code by means of which clustering and classification of programs is possible while exposing explainability and generalizability characteristics.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.