Since the summer of 2019 I have been looking into package dependency compromises, a subset of software supply chain attacks.
Today a number of popular programming languages make heavy use of more or less centralized package repositories and come with tools that make it easy to rely on third-party packages, which often come with lots of dependencies of their own. But with each dependency the attack surface for package dependency compromises increases - and malicious actors have already used different vectors to inject their payloads into software applications.
I had some ideas on how to monitor package repositories to identify malicious packages or malicious versions of packages. In order to test my ideas, at least in theory, against the attacks that were already uncovered in the past, I compiled a timeline of package dependency compromises. Not wanting to let this effort go to waste, I wrote this blog post summarizing and, to some extent, classifying each incident.
During this process I also came across some great prior research, blog posts, news articles, and some counter measures already deployed by the package repository teams. I have also included these events and resources in the timeline, in the hope that it will make the life easier for others interested in this field - a field which in my opinion will see significant action in the next few years.
In the end I found more and more incidents and research, so I decided to split the blog post into two parts, to “release early, release often”. This part covers everything from 2011, which is when the first notable event that I could find happened, to 2017, which is about halfway through my notes.
Please let me know if you find any relevant incidents or research that are missing.
Benjamin Lee Smith publishes a number of Ruby gems that demonstrate how they can be abused for malicious purposes. He later gives a number of talks about his ideas and his experiences distributing these packages, that ping home, at conferences.
Here is a video recording of his “Hacking With Gems” talk at RuLu 2013.
A selection of the proof of concept packages:
awesome-rails-flash-messages pretends to improve the default Ruby on Rails flash messages, but also writes a file containing requests including a password parameter to a web public directory and sends this information to a web server.
The post_install package demonstrates the use of the post install hook to execute code once on a developers machine or alike.
whoamiinformation to server
John Lyle publishes an article about “nodejs malware” on the Systems Security Blog of the University of Oxford. He asks why nobody has written any malware yet, as “targets are juicy and the protection is minimal”. John doesn’t go into details on how the malicious code could be executed and in the example seems to describe a developer of a useful package turning rogue. It is also pointed out that the npm repository didn’t offer a “Report this package” option at the time.
I could not find any discovered cases of actual malicious packages on NPM in the sense of package dependency compromise that was released prior to his blog post.
David Fischer uploads the package “requestes” to the Python Package Index. Its name is one typo away from the popular Python library “requests”, so it blocks at least one potentially typosquatting name for malicious use.
In the package description and through a setup script David makes clear that this package could just as well contain malicious code executed with the rights of the current user.
Insert stern sounding security stuff here…
– David Fischer
A weird thing is, that at the time of writing (7 years later) 4 public GitHub repositories actually do depend on this package, while in addition also including the actual “requests” library in addition. The package was downloaded 1239 times in January of 2020.
John Lyle publishes a follow-up article “Reflections on nodejs malware” (link to archive.org version).
He now extends the scenario to other package repositories like rubygems and PyPI. Answering the question he brought up in his earlier blog post “Where is all the nodejs malware?" he says:
I can only assume that it is because such malware does not offer a high reward /effort ratio. There is too much low-hanging fruit elsewhere (phishing end users, for example) to make this particular avenue of attack worthwhile.
I think that is because the payload execution method he has in mind is that of a package that integrates with and targets specific application code. But frameworks like Ruby on Rails, that set the near-default for an entire programming languages ecosystem, make interception of interesting data on targeted systems easier, like we see with the PoC Ruby gems mentioned before. The other payload execution method of using setup hooks avoids integration with application code entirely - but still, setting up a phishing page is easier.
He also discusses the infection vector of publishing malicious versions of reputable packages on a large scale by attacking their developers using a Cross Site Request Forgery vulnerability that the NPM repository had just recently closed at the time.
A potential mitigation method that would need to be deployed by repositories and the actual programming languages he discusses is a permission-based system, similar to the one used in Android. Access of a packages code to the internet or specific system calls would need to be approved by the developer at the time of adding the dependency.
He ends “this somewhat rambling blog post by proposing that this is a very promising area of research.”
João Jerónimo writes rimrafall, a
proof of concept malicious npm package that tries to execute
rm -rf /* /.*
through a pre-install hook and thereby deleting all files the current user has
This package was uploaded to npm and removed within 2 hours, after it had been
linked to on hackernews, clearly
stating that it should not be installed.
He criticizes the author of “rimrafall” for his immediate public disclosure of the issue and the publishing of a malicious package on NPM, be it a proof of concept or not.
More importantly though, he also raises the issue of typosquatting of packages. With access to the HTTP logs of the npm registry Adam lists a number of “popular” typos made by developers trying to install certain packages. Examples are “coffeescript” and “coffe-script” instead of “coffee-script”, or “socketio” instead of “socket.io”. This will in fact become one commonly used vector of attack later.
Reddit user chub79 posts a link to a PyPI (Python Package Index) package called
This package is typosquatting the popular “setuptools” package, meaning it
imitates being the well known package by being only one typing mistake away.
is executed as part of the setup.py file.
It and sends the package name, the current time, the hostname, the public IP and
whether or not the current user is root/has admin rights to a server under
Users are quick to report it to the PyPI security team and the package is removed. The actor used the handle “vacation” and chose “Kenneth Reitz” as their PyPI display name, imitating the well-known author of the popular “requests” library. This library also turned out to be the target of the other two packages uploaded by this user: “requsts” and “reqests”.
Shortly after its public reveal the C&C server displays the following text:
Usage of this project If you see this page then you came here because you installed some honey pot package over pip. All data that is sent to this server is for pentesting purposes. For more information cosider visiting my blog.
Therefore we are once again looking at a proof of concept/research project.
The author of this package turns out to be Nikolai Philipp Tschacher, who will hand in his Bachelor thesis on “Typosquatting in Programming Language Package Managers” more than one year later, in March 2016.
He looks at developer account takeover, package repository compromise and even long-term operations by state-actors to create, publish, promote and maintain useful packages (or even package repositories) - but including non-obvious and plausibly deniable backdoors.
As (partial) counter measures Rory advises a focus on 2-Factor Authentication for developer accounts on package repositories, signing of packages by default and operational security improvements on the side of the package repositories, for example sending an e-mail to the maintainers of a package if a new version is released).
pizza-party is a simple PoC npm worm written by Chris Contolini (blog post). It adds the spreading code to the install-script of all locally found packages and tries to publish new minor versions of them with the developers credentials. Afterwards it opens a YouTube video in a browser. This module was not published on npm.
Nikolai describes the attack vector of typosquatting (until then mainly seen in the Domain Name System) and applies it to package repositories. He also investigates how different languages and/or package repositories are more or less affected and how some packages repositories already have basic counter-measures in place.
In the empirical part of the thesis he describes the experiment, of which we
already have seen the beginnings of in
Python (PyPI), Node.js (npm) and Ruby (Rubygems.org) were targeted by uploading
214 packages amounting to total 19721 unique installations - most of them (over
15000) through PyPI.
The names of the packages were either automatically generated typos of
async (with a Levenshtein-distance of one) or the names of
standard libraries (which don’t need to be installed through a package manager)
urllib2 in Python 2.
The Top 16 packages based on installation count were fake standard libraries.
On the defense side he looks at the choices and features made by the repositories to prevent certain types of attacks.
npm already doesn’t allow packages names after standard libraries to be uploaded. PyPI and Rubygems.org at the time allowed the registration of packages imitating standard libraries. After Nikolais previous experiment with the package “setuptool” (instead of “setuptools”) PyPI at least disallowed the re-registration of that particular package name.
The CERT Division of the Software Engineering Institute of the Carnegie Mellon University publishes the vulnerability note VU#319816 based on research by Sam Saccone. It describes a worm in form of a npm package that spreads itself by pushing new (minor) versions of packages that the infected developer has push permissions for. These new versions all depend on another package that was created automatically using the developers nodejs credentials. This package contains the “spreading” code that is executed by a post-install hook.
As initial mitigation for the issue, developers are advised to do the following:
- As a user who owns modules you should not stay logged into npm. (Easily enough,
npm shrinkwrapto lock down your dependencies
npm install someModule --ignore-scripts
In response to this npm published a blog post in which the problem is acknowledged, but also states that “npm cannot guarantee that packages available on the registry are safe”.
npm is working with security vendors to introduce enhanced security vulnerability scanning and mitigation services. This work is underway but not yet ready.
March of 2016 was a turbulent month for the nodejs/npm ecosystem in general: While not exactly a security incident, the infamous “left-pad incident” demonstrated how dependent the large parts of the node.js ecosystem are on single, rather small, open source packages. As many teams did not cache the packages that their software depended on, entire build systems could not fetch now removed packages and deployment queues came to a halt.
The reason for this powerful demonstration is the open source developer Azer Koçulus decision (his blogpost) to remove all his node.js packages from npm, after npm, Inc. (npms blogpost) changed the ownership of one of his packages called “kik” to now be owned by Kik Interactive Inc., original producers of the messaging app “kik messenger” (kiks blogpost).
The intricacies of this case regarding trademarks, open source software and commercial interests are not relevant in the case of dependency compromise, but the outfall - 273 partially very popular packages deleted from the npm repository - is. Among them is also the package “left-pad”, which many popular npm projects depended on, either directly or indirectly.
According to npm this particularly high-impact package was re-registered within 10 minutes by a good actor, and with the help of npm the full functionality is restored within 2.5 hours.
Nonetheless this incident shows two security-related insights:
NPM, Inc. reacts to this incident by
changing their policy regarding un-publishing packages, adding a number of conditions that have to be met in order to un-publish a package: either being added just very recently or having a low number of weekly downloads, no other packages dependent on it and a single owner.
creating placeholder packages in place of unpublished
packages, in case other packages depend on the removed package (this likely became unnecessary with the policy change that followed later).
In October 2016 Steve Stagg notices with others at a Python meetup that they can register standard library names in the PyPI package repository.
After e-mailing the security contacts on listed for PyPI didn’t elicit a
response, he proactively registers the system packages himself, and uploads
package versions which simply
raise RuntimeError("Package 'json' must not be downloaded from pypi") to
inform the developer about their mistake.
In January 2017 he opens a GitHub issue on the pypi git repository to make others aware about this security issue, with a description of attack vectors, potential payloads and some statistics. These show that for example the “json” package was downloaded 10,710 times just in December of 2016.
One explanation offered by Steve is that:
There is also the possibility that people have written automatic requirements.txt creators that scrape imports to work out dependencies. In this situation, imports of built-in packages will end-up in requirements files too.
But only after he publishes the post “Building a botnet on PyPi” on hackernoon on May 19th people start to react. In the meantime the most “popular” package “json” had been downloaded nearly 60,000 times. Issues to disallow the upload of packages that imitate standard libraries are created and will eventually be acted on.
requirements-dev.txt(~3000 monthly downloads)
requirements-dev(~50 monthly downloads)
requirements-test.txt(~300 monthly downloads)
These packages do not typosquat an existing package, but prey on users making a
common mistake when trying to install the required packages for a python
These required packages are are usually listed in a file called
pip command line tool parses this file and installs all listed packages
when called with
pip install -r requirements.txt
If a user by mistake leaves out the
pip instead attempts to install a
This package does not exist and this name and 3 common misspellings were
blacklisted already since 2014
(this was later replaced by a
database table of blacklisted names
But the name of the file that lists all required dependencies can be freely
chosen by the developer of the project, and other common names are the 4 names
under which Michał registers this package.
Initially the package cheekily tries to imitate the actually intended action
by installing the requirements from the requirements-file.
Using a dynamic version number it manages to convince
pip to install it again
and again, even if it was already present, by pretending to be version 0.0.0
locally, so that there would always be a newer version on PyPI.
This functionality is later removed in favor of a message being printed, indicating that something probably unintended had just happened.
* mentioned.md in README and source code, but nowhere to be found. Might not have ever been uploaded according to Michał.
On June 1st 2017 security researcher fate0 publishes the blog post “Package 钓鱼”, translating to “Package fishing”. The blog post is in Chinese, therefore this section will describe its content in a bit more detail.
fate0 initially becomes aware of the issue of dependency typosquatting when he
tries to fix a Python ImportError
No module named smb.SMBConnection by
pip install smb - as it turns out this package does not exist and
the required package is called
On the 23rd of May 2019 fate0 creates and registers the following four typosquatting packages
generated with a cookiecutter
template that he later publishes in the GitHub repository
(after the repo called
cookiecutter-evil-pypackage was terminated by GitHub,
as we will soon find out).
The packages send the victims username, hostname, ip and hostinfo to a
webtask.io HTTP endpoint, which then stores them publicly in issues on the
24 hours later more than 700 issues are created through the secondary account (in Chinese internet speak “vest”) called “evilpackage” that fate0 registered for this purpose.
After this success fate0 decides to extend his footprint and creates more
packages using the same template:
proxy are inspired by the auto-suggested
searches when googling for “pip install”.
memcached are based on popular protocols and open source
In the meantime more than 2000 issues with private data are created in the public repository. An unknown actor starts to feed bogus data to the endpoint, and eventually the account that fate0 had created just for the issue-creation is marked as spam by GitHub and blocked (and quickly unblocked again after an e-mail). fate0 decides to send the user data to a server controlled by him instead, allowing him to implement basic IP rate limiting and thereby stop the wave of bogus data. But before that can take effect the collection repository is removed entirely, because of the breach of GitHubs terms of service. fate0 starts to save the data locally on the VPS and builds a website (archive.org screenshot) that he later links to in the message displayed during installation of the typo packages. The website is listing the user data that was uploaded.
On the 31st of May 2017, after some discussion in the aforementioned GitHub issue #644, fate0 deletes the packages (noting that others are now free to re-register them for themselves).
从 2017-05-27 10:38:03 到 2017-05-31 18:24:07，总计 106 个小时内， 有 9726 不重复的 ip 安装了 evil package，平均每个小时有 91 个 ip 安装了 evil package。
[From 2017-05-27 10:38:03 to 2017-05-31 18:24:07, in a total of 106 hours, 9726 unique IPs installed an evil package, an average of 91 IPs installing an evil package every hour.]
The Top 6 packages based on downloads according to fate0 are:
During testing fate0 also notices
the same issue that Michał had demonstrated in January of the same year,
namely that users might mistakenly try to install a package called
instead of installing the requirements listed in a file of the same name.
But fate0 also expands on the issue, making it more pressing.
As previously explained, uploading a package called “requirements.txt” is not possible, because this name is on the hard-coded blacklist at the time. But fate0 looks at how PyPI searches for package names and finds out that any sequence of dots, hyphens and underscores are reduced to one hyphen on both sides of the package name comparisons:
lower(regexp_replace($1, '(\.|_|-)+', '-', 'ig'))
This is done through a function written in SQL called
Therefore a search for
requirements.txt would result in a comparison of
requirements-txt to all package names normalized in the same fashion, making
ReQuIrEmEnTs---tXt a possible result.
Critically, this normalization is not done when comparing the name of a new
package against the blacklist. This allows fate0 to upload the package
requirements-txt to PyPI.
pip install requirements.txt still returns an error: “Could not
find a version that satisfies the requirement requirements.txt”.
The reason for this error is that the requested package name is compared to the
package names of the versions available listed by PyPI.
But fate0 discovers that this check can be avoided if a
Python “wheel” (packed binary
package format instead of the “source” format) is offered.
In that case only the normalized wheel name is compared to the requested package
name, returning a positive match.
The big advantage of using wheel packages instead of the source packages is that
the wheels are already “built” for your system architecture, therefore there is
no need to execute code in a “setup.py” on installation.
fate0 circumvents this hurdle by making his “requirements–txt” package wheel
depend on another “source” package that he controls, called
This package contains the payload that “reminds users who
@kentcdodds Hi Kent, it looks like this npm package is stealing env variables on install, using your cross-env package as bait
The malicious package sends a HTTP POST request to the server
that contains the environment variables as part of its
and the package repository removes this package together with 37 other packages
published by the same user “hacktask” on the 19th of July:
blog post on this incident
the npm team clarifies that the “baseline” of 39-43 downloads that every package
meets is likely caused by automatic downloads and mirroring of the package
The number of downloads also rises with the number of versions that the mirrors
have to download, and therefore the npm team estimates that “at most 50 real
crossenv” occured, with
jquery.js coming second in the
number of real installations.
In reaction npm, mostly symbolically, adds the e-mail address of “hacktask” to the blacklist, and suggests that they might add automatic checks for typosquatting of popular packages.
So far this was the usual reporting on facts. Now I want to present my theory regarding this case:
I think this npm typosquatting attack was inspired by the previously described blog post and research on typosquatting on PyPI by fate0, because of the following reasons:
npm.hacktask.netand npm user) forked the GitHub repository aliyun-node/commands, containing shell helpers to administer nodejs in the Alibaba Cloud, some time after the July 13th 2017.
hacktask.nethad been using a Chinese nameserver until February 23rd 2017, when they moved behind Cloudflare, similar to
hacktask.org. Both hosted Mandarin-language content.
xss.hacktask.netused to host a Chinese SaaS C&C for XSS probes, as pointed out by diimdeep.
tkinter is a weird package to typosquat on npm, as it is a
TK-GUI interface library for Python
and has no popular counterpart in the nodejs ecosystem with a similar name.
This incident shows that in order to secure users of package repositories we cannot simply focus on our favorite package manager of choice, but have to look across language borders to detect potential threats before they hit.
On September 29th 2017 the Slovakian Computer Security Incident Response Team publishes the advisory skcsirt-sa-20170909-pypi-malicious-code, in which they warn about 10 typosquatting packages that were uploaded to PyPI between the 2nd - 4th of June of the same year. They do an exceptionally good job in sharing the Indicators of Compromise (IOCs) and offering scripts for developers to figure out if they have been affected.
The following packages were identified:
The packages contain the code of the typosquatted target packages, but in
addition send the name of the package, the username of the current user and the
hostname of the machine (XOR “encrypted”) to
SKCSIRT notifies the Python Security Response Team (PSRT) and “all identified packages were taken down immediately”.
The author(s) of the packages added the comment “just toy, no harm :)” to the payload.
After reading Nikolai Tschachers thesis on typosquatting from March 2016 and Steve Staggs similar story, Benjamin Bach notices, that many of the packages that were named after standard library names (or typosquats of them) were simply deleted, making them available for registration again. After trying to alert the PyPI team and Python security team about this issue, according to him without a response, Benjamin starts the “pytosquatting” project together with IT-security journalist Hanno Böck.
They register available standard library names and std-lib typosquats on PyPI and upload their own “blocker”-packages. These would make a HTTP request to a server as part of the setup routine, to count the number of installs. Afterwards they would interrupt the installation process and inform the user of what just happened.
Alleine das Paket urllib2 wird jeden Tag von über Tausend Personen installiert.
[The package urllib2 alone was installed by over a thousand people per day]
Hanno Böck in his article about this on Golem.de [in German]
They later presented the talk “Package Mis-Management” at BornHack 2018 about this project.
On the 18th September 2017 Pull Request #2409 is merged into PyPI warehouse, the “backoffice” of PyPI. It adds a routine that fuzzy matches new packages against a list of python standard library names and restricts their registration if there is a match. Issue #2151, opened in reaction to Steves post on Hackernoon, is closed.
On September 22nd Python core developer Victor Stinner publishes an in-depth incident report on the Python security annoucements mailing list adressing the advisory by the SKCSIRT and the issue of typosquatting on PyPI in general. This mailing list was created as a result of the advisory.
Victors report adds one package
xml (impersonating a standard library) to the
list of packages being part of this attack.
Additionally it covers the recent history of typosquatting on PyPI, starting
Nikolais thesis in March 2016,
over the fate0 incident
to the pytosquatting-project.
The report then discusses a number of mitigation techniques and their advantages and disadvantages, like 3rd party component review, client side typo detection and notification, the blocking of registration for certain package names, and server notifications for similar project names (to be looked at by PyPI admins), which is the solution that was chosen.
I’m now about halfway through the list of package dependency compromise events that I compiled and decided to release this first part of the timeline. If you see an event missing, please let me know, and thanks for reading this far.