GitHub Copilot is the first true product based on large language models
This article is part of our series that explores the business of artificial intelligence
Since GPT-2, there has been a lot of excitement around large language model applications. And over the past few years, we’ve seen LLMs used for many exciting tasks, such as writing articles, designing websites, creating images, and even writing code.
But like I said before, there’s a big gap between showing that a new technology does something cool and using the same technology to create a successful product with a viable business model.
Microsoft, I think, just launched the first real LLM product with the public release of GitHub Copilot last week. It is an application that has a strong product/market fit, has immense added value, is hard to beat, is profitable, has very strong distribution channels and can become a source of big profits.
The release of GitHub Copilot is a reminder of two things: first, LLMs are fascinating, but they’re useful when applied to specific tasks as opposed to general artificial intelligence. And second, the nature of LLMs gives big tech companies like Microsoft and Google an unfair advantage in marketing them – LLMs are undemocratic.
Copilot is an AI programming tool that comes installed as an extension on popular IDEs like Visual Studio and VS Code. It provides suggestions as you write code, something like auto-completion but for programming. Its capabilities range from completing a line of code to creating entire blocks of code such as functions and classes.
Copilot is powered by Codex, a version of OpenAI’s popular GPT-3 model, a large language model that has grabbed headlines for its ability to perform a wide range of tasks. However, unlike GPT-3, the Codex has been fine-tuned for programming tasks only. And it produces impressive results.
The success of GitHub Copilot and Codex highlights an important fact. When it comes to actually using LLMs, specialization trumps generalization. When Copilot was first introduced in 2021, CNBC reported: “…back when OpenAI was first training [GPT-3]the start-up had no intention of teaching him to help code, [OpenAI CTO Greg] Brockman said. It was more of a general purpose language model [emphasis mine] which could, for example, generate articles, correct incorrect grammar and translate from one language to another.
But while GPT-3 found mild success in various applications, Copilot and Codex proved to be big hits in one specific area. Codex cannot write poetry or articles like GPT-3, but it has proven to be very useful for developers of varying levels of expertise. Codex is also much smaller than GPT-3, which means it is more memory and computationally efficient. And because it was trained for a specific task as opposed to the open and ambiguous world of human language, it is less prone to the pitfalls that models like GPT-3 often fall into.
It should be noted, however, that just as GPT-3 knows nothing about human language, Copilot knows nothing about computer code. This is a transformer model that has been trained on millions of code repositories. Given a prompt (e.g. a piece of code or a textual description), it will try to predict the next sequence of instructions that makes the most sense.
With its huge training corpus and massive neural network, Copilot mostly makes good predictions. But sometimes it can make silly mistakes that the most novice programmer would avoid. He doesn’t think about programs like a programmer does. It can’t design software or think in stages and think about user requirements and experience and all the other things that go into building successful apps. It is not a replacement for human programmers.
Copilot product/market fit
One of the important milestones for any product is to achieve product/market fit or prove that it can solve certain problems better than market alternatives. In this regard, Copilot was a resounding success.
GitHub first launched Copilot last June and has since been used by over a million developers.
According to GitHub, in files where Copilot is enabled, it accounts for about 40% of the code written. The developers and engineers I spoke to last week say that while there are limits to Copilot’s capabilities, there’s no denying that it dramatically improves their productivity.
For some use cases, Copilot competes with StackOverflow and other code forums, where users need to search for the solution to a specific problem they are facing. In this case, the added value of Copilot is very obvious and palpable: less frustration and distraction, more concentration. Instead of leaving their IDE and searching the web for a solution, developers simply type in the description or docstring of the feature they want, and Copilot does most of the work for them.
In other cases, Copilot competes with writing frustrating code manually, like setting up matplotlib graphs in Python (a super frustrating task). Although the release of Copilot may require some tweaking, it relieves most of the burden on developers.
In many other use cases, Copilot has been able to establish itself as a superior solution to the problems that many developers face every day. Developers told me about things like running test cases, setting up web servers, documenting code, and many other tasks that previously required manual effort and were strenuous. Copilot helped them save a lot of time in their daily work.
Distribution and profitability
Product/market fit is just one of many components to creating a successful product. If you have a good product but can’t find the right distribution channels to deliver its value in a profitable and profitable way, then you’re doomed to fail. At the same time, you’ll need a plan to maintain your edge over your competition, prevent other companies from replicating your success, and ensure you can continue to generate value over time.
To make Copilot a successful product, Microsoft had to bring together several very important elements, including technology, infrastructure and market.
First, it needed the right technology, which it acquired through its exclusive license to technology from OpenAI. Since 2019, OpenAI has stopped opening up its technology and instead licenses it to its backers, foremost among them Microsoft. Codex and Copilot were created from GPT-3 with the help of OpenAI scientists.
Other large technology companies have been able to create large language models comparable to GPT-3. But there is no denying that LLMs are very expensive to train and run.
“For a model 10 times smaller than Codex – the model behind Copilot (which has 12B parameters on paper) – it takes hundreds of dollars to do the evaluation on this benchmark they used in their paper”, Loubna Ben Allal, machine learning engineer at Hugging Face, told TechTalks. Ben Allal referred to another benchmark used for the Codex evaluation, which cost thousands of dollars for his own smaller model.
“There are also security issues because you have to run untrusted programs to assess the model which might be malicious, sandboxes are usually used for security,” Ben Allal said.
Leandro von Werra, another ML engineer at Hugging Face, estimated training costs to be in the tens to hundreds of thousands of dollars depending on the size and number of experiments needed to get it right.
“Inference is one of the biggest challenges,” von Werra added in TechTalks comments. “While almost anyone with resources can train a 10B model these days, getting inference latency low enough to feel responsive to the user is an engineering challenge.”
This is where Microsoft’s second advantage comes in. The company has been able to create a large, specialized cloud infrastructure for machine learning models such as Codex. It performs inference and provides suggestions in milliseconds. And more importantly, Microsoft is able to run and deliver Copilot at a very affordable price. Currently, Copilot is offered for $10/month or $100/year, and it will be provided free to students and maintainers of popular open source repositories.
Most of the developers I spoke to were very happy with the pricing model because it saved them a lot more time than its price.
Abhishek Thakur, another ML engineer at Hugging Face who I spoke to earlier this week, said, “As a machine learning engineer, I know a lot goes into building products like these, in particular Copilot, which provides suggestions with sub-millisecond latency. Building an infrastructure that serves these kinds of models for free is not feasible in the real world for a longer period of time.
However, it is not impossible to run code generator LLMs at affordable rates.
“In terms of the computation to build these models and the data needed: it’s quite doable and there have been some Codex replications such as Meta’s Incoder and CodeGen (now available for free on the Hugging Face Hub) from Salesforce matching the performance of Codex,” von Werra said. “There is definitely some engineering involved in building the models into a fast and enjoyable product, but it looks like a lot of companies could do that if they wanted to.”
However, that’s where the third piece of the puzzle comes in. Microsoft’s acquisition of GitHub gave it access to the largest developer market, making it easy for the company to put Copilot in the hands of million users. Microsoft also owns Visual Studio and VS Code, two of the most popular IDEs with hundreds of millions of users. This reduces the friction for developers to adopt Copilot as opposed to another similar product.
With its pricing, efficiency, and market reach, Microsoft appears to have cemented its leadership position in the emerging AI-assisted software development market. The market can take other turns. What is certain (and as I have already pointed out) is that the large language models will open up many opportunities to create new applications and new markets. But they won’t change the fundamentals of sound product stewardship.
This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new technologies, and what we need to watch out for. You can read the original article here.