4 tools to prevent leaks in public code repositories

Secrets stored in Git repositories have been a thorn in the side of developers and a go-to source for attackers for a long time. Ensuring that sensitive information is stored appropriately and scrubbed from repositories has become a necessity to reduce the likelihood of software being compromised, often in very public ways. While this seems obvious, it’s easy to overlook hardcoded connection strings, passwords, and even plaintext credentials stored by the development tool itself. Visual Studio, for instance, can store SQL connection credentials in plaintext unless told otherwise.

In 2020 alone, GitGuardian detected over 2 million secrets in public repositories. It has been largely hypothesized that a leaked credential produced by an intern played a part in the execution of the SolarWinds attack. With such high-profile cases like this, it’s worth taking a minute to evaluate whether your own projects could be exposed in this way.

The trick is finding the secrets to begin with. They are often tucked away in code or obscure XML files and encoded in ways that are difficult to find. Manually scrubbing code is both error prone and likely to result in oversights. Unfortunately, since Git, like other source control systems, retain previous commits, cleaning up an exposed secret goes beyond merely deleting the secret from the code and recommitting. It needs to be purged from the history, which can sometimes mean starting over. Because of this, it’s important to get things right and get them right early in the process.

Fortunately, several tools are available to help deal with this sort of issue. While most are command line tools, some are web-based options. All share similar functionality but achieve the result in slightly different ways. The major pieces of information that they look for include usernames, passwords, private keys, and other potentially sensitive information.

When considering which one to use, it’s important to evaluate your own technical abilities, time available to learn a new tool, whether you need custom detections, and budget.