A Comparative Analysis of Crypto API Misuses Across Programming Languages

6 May 2024


(1) Anna-Katharina Wickert, Technische Universität Darmstadt, Darmstadt, Germany (wickert@cs.tu-darmstadt.de);

(2) Lars Baumgärtner, Technische Universität Darmstadt, Darmstadt, Germany (baumgaertner@cs.tu-darmstadt.de);

(3) Florian Breitfelder, Technische Universität Darmstadt, Darmstadt, Germany (florian.breitfelder@tu-darmstadt.de);

(4) Mira Mezini, Technische Universität Darmstadt, Darmstadt, Germany (mezini@cs.tu-darmstadt.de).

Abstract and 1 Introduction

2 Background

3 Design and Implementation of Licma and 3.1 Design

3.2 Implementation

4 Methodology and 4.1 Searching and Downloading Python Apps

4.2 Comparison with Previous Studies

5 Evaluation and 5.1 GitHub Python Projects

5.2 MicroPython

6 Comparison with previous studies

7 Threats to Validity

8 Related Work

9 Conclusion, Acknowledgments, and References


As one motivation of this paper was to empirically shed light on the question whether Python crypto libraries help developers in writing more secure code than previous empirical studies in Java or C have revealed, we compare our results to the findings of these studies. For all applications which use crypto, we observe more secure applications than reported by Egele et al. [4]. While for Android apps using crypto libraries at least one misuse occurs in 87.90 % of the apps, we observe "only" 52.26 % of Python applications with a misuse. Unfortunately, Zhang et al. [13] only report the number of C firmware images they started to analyze and not explicitly mention how many of these actually use a crypto API. Thus, we only know that 24.18 % of the analyzed firmware images have at least one misuse.

In Figure 4 we present the percentage of applications per study and per rule with a misuse. In general, we observed that misuses in Python occur less frequently than for Java and C. The misuse of ECB as an encryption mode is the most-misused in Java and C, and is significantly less with 3.23 % of the Python applications. We hypothesize that this difference is due to the design of the libraries as discussed in Section 5.1.

Observation 5

This paper is available on arxiv under CC BY 4.0 DEED license.