Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Ever wonder which programming languages are the most-used in machine learning? How about which artificial intelligence (AI) and data science packages are tapped by developers more frequently than all others? GitHub resolved a few of those mysteries today, in a follow-up to the 2018 Octoverse report it published in October.
The Microsoft-owned platform pulled info on contributions — e.g., pushing code, opening an issue or pull request, commenting on an issue or pull request, or reviewing a pull request — between January 1, 2018 and December 31, 2018. For the most-imported packages, they used data from GitHub’s dependence graph, which includes all public repositories and any private repositories that have opted in.

Above: The most popular programming languages in machine learning projects on GitHub.
Among contributors to repositories tagged with the “machine-learning” topic, Python is the most common language. That’s not surprising — it’s the third-most used language on GitHub overall. In close second is C++, followed by JavaScript, Java, C#, Julia, Shell, R, TypeScript, and Scala.

Above: The most popular machine learning packages on GitHub.
As for the top packages, Numpy — a package with support for mathematical operations on multidimensional data — is far and away the leader by volume, with three-quarters of AI projects on GitHub using it. The next three most-imported packages — scientific computation toolkit Scipy, dataset management tool Pandas, and visualization library matplotlib — are used in over 40 percent of projects, as is scikit-learn (the fifth-most imported package).
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
- Turning energy into a strategic advantage
- Architecting efficient inference for real throughput gains
- Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO

Above: The most popular machine learning projects on GitHub.
So what about the most popular open source machine learning projects? Google’s open source TensorFlow framework topped the list, followed by scikit-learn and two natural language processing projects, explosion/spaCy and RasaHQ/rasa_nlu. The next four top projects are focused on image processing: CMU-Perceptual-Computing-Lab/openpose, thtrieu/darkflow, ageitgey/face_recognition, and tesseract-ocr/tesseract.