Major Data Breaches

If one word could sum up the 2021 infosecurity year (well, actually three), it would be these: “supply chain attack”.

A software supply chain attack happens when hackers manipulate the code in third-party software components to compromise the ‘downstream’ applications that use them. In 2021, we have seen a dramatic rise in such attacks: high profile security incidents like the SolarWinds, Kaseya, and Codecov data breaches have shaken enterprise’s confidence in the security practices of third-party service providers.

What does this have to do with secrets, you might ask? In short, a lot. Take the Codecov case (we’ll go back to it quickly): it is a textbook example to illustrate how hackers leverage hardcoded credentials to gain initial access into their victims’ systems and harvest more secrets down the chain.

Secrets-in-code remains one of the most overlooked vulnerabilities in the application security space, despite being a priority target in hackers’ playbooks. In this article, we will talk about secrets and how keeping them out of source code is today’s number one priority to secure the software development lifecycle.

What is a secret?

Secrets are digital authentication credentials (API keys, certificates, tokens, etc.) that are used in applications, services or infrastructures. Much like a password (plus a device in case of 2FA) is used to authenticate a person, a secret authenticates systems to enable interoperability. But there is a catch: unlike passwords, secrets are meant to be distributed.

To continually deliver new features, software engineering teams need to interconnect more and more building blocks. Organizations are watching the number of credentials in use across multiple teams (development squad, SRE, DevOps, security etc.) explode. Sometimes developers will keep keys in an insecure location to make it easier to change the code, but doing so often results in the information mistakenly being forgotten and inadvertently published.

In the application security landscape, hardcoded secrets are really a different type of vulnerability. First, since source code is a very leaky asset, meant to be cloned, checked out, and forked on multiple machines very frequently, secrets are leaky too. But, more worryingly, let’s not forget that code also has a memory.

Any codebase is managed with some kind of version control system (VCS), keeping a historical timeline of all the modifications ever made to it, sometimes over decades. The problem is that still-valid secrets can be hiding anywhere on this timeline, opening a new dimension to the attack surface. Unfortunately, most security analyses are only done on the current, ready-to-be-deployed, state of a codebase. In other words, when it comes to credentials living in an old commit or even a never-deployed branch, these tools are totally blind.

Six million secrets pushed to GitHub

Last year, monitoring the commits pushed to GitHub in real-time, GitGuardian detected more than 6 million leaked secrets, doubling the number from 2020. On average, 3 commits out of 1,000 contained a credential, which is fifty percent higher than last year.

A large share of those secrets was giving access to corporate resources. No wonder then that an attacker looking to gain a foothold into an enterprise system would first look at its public repositories on GitHub, and then at the ones owned by its employees. Many developers use GitHub for personal projects and can happen to leak by mistake corporate credentials (yes, it happens regularly!).

With valid corporate credentials, attackers operate as authorized users, and detecting abuse becomes difficult. The time for a credential to be compromised after being pushed to GitHub is a mere 4 seconds, meaning it should be immediately revoked and rotated to neutralize the risk of being breached. Out of guilt, or lacking technical knowledge, we can see why people often take the wrong path to get out of this situation.

Another bad mistake for enterprises would be to tolerate the presence of secrets inside non-public repositories. GitGuardian’s State of Secrets Sprawl report highlights the fact that private repositories hide much more secrets than their public equivalent. The hypothesis here is that private repositories give the owners a false sense of security, making them a bit less concerned about potential secrets lurking in the codebase.

That’s ignoring the fact that these forgotten secrets could someday have a devastating impact if harvested by hackers.

To be fair, application security teams are well aware of the problem. But the amount of work to be done to investigate, revoke and rotate the secrets committed every week, or dig through years of uncharted territory, is simply overwhelming.

Headline breaches… and the rest

However, there is an urgency. Hackers are actively looking for “dorks” on GitHub, which are easily recognized patterns to identify leaked secrets. And GitHub is not the only place where they can be active, any registry (like Docker Hub) or any source code leak can potentially become a goldmine to find exploitation vectors.

As evidence, you just have to look at recently disclosed breaches: a favorite of many open-source projects, Codecov is a code coverage tool. Last year, it was compromised by attackers who gained access by extracting a static cloud account credential from its official Docker image. After having successfully accessed the official source code repository, they were able to tamper with a CI script and harvest hundreds of secrets from Codecov’s user base.

More recently, Twitch’s entire codebase was leaked, exposing more than 6,000 Git repositories and 3 million documents. Despite lots of evidence demonstrating a certain level of AppSec maturity, nearly 7,000 secrets could be surfaced! We are talking about hundreds of AWS, Google, Stripe, and GitHub keys. Just a few of them would be enough to deploy a full-scale attack on the company’s most critical systems. This time no customer data was leaked, but that’s mostly luck.

A few years ago, Uber was not so lucky. An employee accidentally published some corporate code on a public GitHub repository, that was his own. Hackers found out and detected a cloud service provider’s keys granting access to Uber’s infrastructure. A massive breach ensued.

The bottom line is that you can’t really be sure when a secret will be exploited, but what you must be aware of is that malicious actors are monitoring your developers, and they are looking for your code. Also keep in mind that these incidents are just the tip of the iceberg, and that probably many more breaches involving secrets are not publicly disclosed.


Secrets are a core component of any software stack, and they are especially powerful, therefore they require very strong protection. Their distributed nature and the modern software development practices make it very hard to control where they end up, be it source code, production logs, Docker images, or instant messaging apps. Secrets detection and remediation capability is a must because even secrets can be exploited in an attack leading to a major breach. Such scenarios happen every week and as more and more services and infrastructure are used in the enterprise world, the number of leaks is growing at a very fast rate. The earlier action is taken, the easier it is to protect source code from future threats.

Note – This article is written by Thomas Segura, technical content writer at GitGuardian. Thomas has worked as both an analyst and software engineer consultant for various big French companies.