Due to its interdisciplinary nature, the development of data science code is subject to a wide range of potential mistakes that can easily compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low level programming issues. We discuss the steps needed to implement a tool that is rather meant to focus on higher level errors that are specific of the data science pipeline. To this end, we propose a static analysis assigning ad hoc abstract datatypes to the program variables, which are then checked for consistency when calling functions defined in data science libraries. By adopting a descriptive (rather than prescriptive) abstract type system, we obtain a linter tool reporting data science related code smells. While being still work in progress, the current prototype is able to identify and report the code smells contained in several examples of questionable data science code.

Towards a High Level Linter for Data Science / Dolcetti, Greta; Cortesi, Agostino; Urban, Caterina; Zaffanella, Enea. - ELETTRONICO. - (2024), pp. 18-25. ( 10th ACM SIGPLAN International Workshop on Numerical and Symbolic Abstract Domains, NSAD 2024 Pasadena (CA) - USA 2024) [10.1145/3689609.3689996].

Towards a High Level Linter for Data Science

Enea Zaffanella
2024-01-01

Abstract

Due to its interdisciplinary nature, the development of data science code is subject to a wide range of potential mistakes that can easily compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low level programming issues. We discuss the steps needed to implement a tool that is rather meant to focus on higher level errors that are specific of the data science pipeline. To this end, we propose a static analysis assigning ad hoc abstract datatypes to the program variables, which are then checked for consistency when calling functions defined in data science libraries. By adopting a descriptive (rather than prescriptive) abstract type system, we obtain a linter tool reporting data science related code smells. While being still work in progress, the current prototype is able to identify and report the code smells contained in several examples of questionable data science code.
2024
Towards a High Level Linter for Data Science / Dolcetti, Greta; Cortesi, Agostino; Urban, Caterina; Zaffanella, Enea. - ELETTRONICO. - (2024), pp. 18-25. ( 10th ACM SIGPLAN International Workshop on Numerical and Symbolic Abstract Domains, NSAD 2024 Pasadena (CA) - USA 2024) [10.1145/3689609.3689996].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/3030613
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact