Code Intelligence before Artificial Intelligence

AI needs to understand code before it can write it

Apr 02, 2024

Code is where it begins

The first expression of a new idea, a new product feature, a bug fix, or a new way to test your product is all through code. Yes, a lot of machinery has spun up over the years to help with other aspects of building software - integrations, deployment, monitoring, etc. As important as these things are, none of them have a reason to come into the picture if there is no code. This is why developers obsess about their favorite programming languages, development environments, editors, and all other code-focused concepts.

The last couple of years have seen a surge in the use of Generative AI (GenAI) technologies in coding. Whether getting automatic code completions while writing code in an editor or being able to ask questions in a chat window, developers are getting an immense productivity boost from GenAI. Developers can now spend more time on things that uniquely demand human creativity and attention because AI takes care of the grunt work - the stuff that AWS calls “undifferentiated heavy lifting” (love that term) and what we, at Sourcegraph, call “developer toil.”

Code Understanding is the hard part

Writing Code is fun (for the most part). The thrill one sees from running a new piece of code do exactly what one had envisioned, is very satisfying. However, ask any developer working in an actual codebase inside an organization, and they will tell you that writing code is a tiny portion of their workday. A lot of their time is spent on understanding code that already exists. Any new code a developer writes needs to build on the structure, practices, domain knowledge, etc., of the existing codebase in which they work. Once code understanding has been accomplished, code writing is just more fun.

How do we get developers to the fun part faster? This was the problem Sourcegraph had set about to solve when the company was initially incepted. They did this through their original product - Code Search. The idea was to create a “Google for Code” (Yes, this was the time when “Google for X” and “Uber for Y” phrases were commonly used in the startup world). Any organization with non-trivial code investments has multiple repositories, potentially spread across numerous code hosts, living in various locations (cloud, on-premises, etc.), and needs to have hundreds (sometimes thousands) of developers comprehend and work on these code assets. Code Search allowed developers to search across all these code investments, track insights, and make large-scale changes in an automated manner.

While Code Search is the product, the foundation is something that can only be described as Code Intelligence - the ability to build fast and deep code understanding from vast codebases. Code Intelligence powers the functionality that allows developers to search across millions of lines of code. For an enterprise, it might mean searching and fixing code in thousands of private repositories. For an open source developer, it might mean searching across a large portion of the open source universe using free public code search.

Code Intelligence improves AI CodeGen

Since Code Intelligence is all about becoming codebase aware (by picking up all the relevant context about a codebase), it can massively improve the output of any code-focused tools. With the advent of GenAI, the opportunity for improvement was even more significant, especially because AI hallucinates. The only way to reduce these hallucinations is to provide more context to AI about the problem you are trying to solve. Which source files to look in? Which docs to parse through? Through Code Intelligence, Sourcegraph has a history of answering questions like these. So we decided to build a Codebase-aware Code AI assistant called Cody. The foundation that allowed us to help developers search across massive codebases now helps us provide contextually accurate code suggestions to developers when they are writing code in their editors or asking code-related questions in chat.

A good description of what Cody is:

❌ It is not a transactional copilot that fetches results from an LLM
✅ But more like a senior engineer on your team who has built tons of context about your codebase over the years and can point you in the right direction and prevent hours of toil

In other words, the present looks like:

Code Intelligence (from Code) + LLMs = Better Code-Gen

Code Intelligence will enable autonomous coding

The speed with which Code AI tools, despite their flaws, have been adopted by developers worldwide is a very positive sign for the future of software development.

Most Code AI tools today are human-supervised (for a good reason), but autonomous coding is coming. That will happen not because a super LLM suddenly knows how to do everything but because reasonably bright LLMs will rely on the most relevant code context provided to them. Here is a great explanation.

More on this idea around Code Intelligence (and Code Understanding), in this video from a recent Sourcegraph event in San Francisco:

Code Intelligence is not static. Today, it is a "system of logic" created by understanding code. But, as Quinn describes in his write-up, code is not the only thing relevant to building an understanding of software systems. Several other assets like docs, logs, other dev tools, defect tracking systems, project management tools, etc., can bring much-needed context to further enhance this notion of Code Intelligence.

In other words, the future will look like:

Code Intelligence (from all dev-tools) + LLMs = Autonomous Coding

And that future will be fantastic!

Complexity is overrated

Discussion about this post