Innovation How Open-Source Distribution Data Can Help To Make The Software Supply Chain More Secure Avi Press Forbes Councils Member Forbes Technology Council COUNCIL POST Expertise from Forbes Councils members, operated under license. Opinions expressed are those of the author. | Membership (fee-based) Aug 15, 2022, 06:30am EDT | Share to Facebook Share to Twitter Share to Linkedin Avi Press is a software developer, open-source advocate and founder and CEO of Scarf .
getty Long gone are the days when you would buy your software on a disk and your company’s IT administrators were tasked with installing and maintaining that software without the help of the automatic online updates that we take for granted today. When a bug or a CVE was discovered, there was very little the software company could do beyond patch it in the next scheduled release and hope users updated. Without the ability to be any more proactive than that, vulnerabilities would be in production use for long after they had been patched.
We’re facing a similar situation in open-source software today, and it’s worth exploring what it means for the industry as a whole—and what we can do about it. A Likely CVE Scenario Imagine a severe CVE is exposed in a small but highly relied-upon open-source dependency. The vast majority of users don’t even know they’re relying upon it, so they don’t pay much attention to the creators’ pleas to upgrade.
The specific piece of code is being used in hundreds of thousands of open-source and closed-source projects, infrastructure, consumer electronics and other applications. How can we most efficiently get the entire world to upgrade its software so that no one is using this vulnerable piece of code? Under our current setup, this problem poses significant challenges. We simply don’t have a grasp on who is using that code and under what contexts.
Recommended For You 1 Google Issues Warning For 2 Billion Chrome Users More stories like this Fewer stories like this 2 Forget The MacBook Pro, Apple Has Bigger Plans More stories like this Fewer stories like this 3 Google Discounts Pixel 6, Nest & Pixel Buds In Limited-Time Sale Event More stories like this Fewer stories like this The very nature of open-source code is that it often becomes inextricably connected to underlying infrastructure that then builds its foundations on potentially vulnerable code. Look at Log4j , for example. This internet vulnerability is one of the most serious we’ve ever come across, and it’s very hard to eliminate because it is at the core of so much of our modern technological infrastructure.
Even with widespread awareness around this particular vulnerability, it’s very difficult to identify the millions of instantiations so we can rectify them. We simply don’t have the distribution data that we need to make a dent. What Can We Do About It? The core of a potential solution is better data.
If maintainers knew which organizations relied on their software, they’d be in a much better position to help those people upgrade and patch the vulnerability. They could identify where to deploy effort and resources in fixing the problem proactively at scale. In fact, the data could even unlock a new industry—for companies to offer consultation and support services related to these key open-source vulnerabilities.
However, none of this works without knowing who you need to help. Remember that open-source code is repackaged and redistributed in complex ways. It’s not just about your primary users; it’s also about their users, their users’ users and so on.
Things get more and more complex as you move through the layers of abstraction. For maintainers, there is very little they can do currently to track how their code is being used. I’d like to argue that distribution data is that missing piece.
Open-source software distribution congregates around specific distribution channels throughout the industry, and those platforms, packages and artifact registries hold the key. If we could understand how our open-source tools are being used and make that data available and useable, we’d have a much better sense of the scale of any particular piece of code and the specific users relying on it. Some companies may push back on this idea and suggest that they already have their team doing regular audits of the codebase and doing their own analysis for specific dependencies and potential vulnerabilities, but it’s a bit more complicated than that.
Your software developers might be installing specific pieces of open-source code on their own computers, from text editors to command-line tools to productivity apps, and your company is exposed to those risks as well. Standard tools that perform static code analysis simply don’t go far enough. The download data, on the other hand, provides a very rich dataset that (up until now) has almost gone completely unaddressed.
With this information, the community as a whole could take active steps toward rectifying vulnerabilities because you would know how to go about doing it. This is not a panacea by any stretch of the information, but it is a way we can move toward more robust security across the entire open-source supply chain. As more and more companies rely on open-source code, it becomes increasingly important.
Open-source creators today are working with virtually none of this distribution data today, so there are many ways to start making the transition. They can encourage their artifact registries to start exposing this kind of data or switch to alternative solutions that are more amenable to it. Even hosting one’s own registry for the distribution of public artifacts can offer the needed observability.
Of course, this is easier said than done. Hosting one’s own artifact registry is challenging and expensive—even more so when it must perform analytics while also ensuring the system properly handles sensitive data in order to be compliant with regulations like GDPR. Today’s managed registry options for open-source artifacts, on the other hand, are typically slow-moving—especially when poorly funded.
Even the most well-funded registries have only just begun requiring basic security like two-factor authentication this year. When direct and obvious security features like two-factor authentication arrive in the package management space slowly, it’s not surprising that the more indirect pieces of the puzzle like data analytics are still yet to be broadly implemented. As with most big changes in the open-source community, little is achieved unless there is large-scale community buy-in.
As more and more maintainers begin to take steps to ensure greater observability, we will enable better security for the entire ecosystem. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify? Follow me on Twitter or LinkedIn .
Check out my website . Avi Press Editorial Standards Print Reprints & Permissions.
From: forbes
URL: https://www.forbes.com/sites/forbestechcouncil/2022/08/15/how-open-source-distribution-data-can-help-to-make-the-software-supply-chain-more-secure/