GitHub Copilot Research Finds “Downward Pressure on Code Quality”

Choice quotes from this Copilot research on GitHub:

  • using Copilot is strongly correlated with mistake code being pushed to the repo
  • current implementation of AI Assistants discourages code reuse
  • we have entered an era where code lines are being added faster than ever before. The better question for 2024: who’s on the hook to clean up the mess afterward
  • code generated during 2023 more resembles an itinerant contributor, prone to violate the DRY-ness of the repos visited

I downloaded the whitepaper to read myself, and in “The Problem with AI-Generated Code”, I’m struck by the 2 realizations there that seem obvious in retrospect:

  1. When I used Copilot, and more-so my brief interactions with ChatGPT, it did generate a lot of code that wasn’t needed. This is one of the main points of TDD: only writing the minimum amount of code needed so you don’t have to maintain code you didn’t need later, which adds up fast.
  2. “Being inundated with suggestions for added code, but never suggestions
    for updating, moving, or deleting code” 😨, omg, yes! I always overjoyed when Copilot would help me auto-complete, but concerned when it suggested things that I didn’t really need right now; it was clear it had read similiar code bases so would add additional fields that seemed like they may be needed in the future, but weren’t right now.

Has great context in there potentially indicating why Juniors may love it, and us older folks more concerned.

Another concerning trend, especially for XP fans, is new code has increased, and old code isn’t re-used/refactored as much at large. Implies, yes, it’s possible the team did deliver faster & learned their user’s true use case, but at the cost of MUCH higher technical debt. Not enough data to know in this “150 million lines of code” what methodologies & code quality metrics were used in these projects to really narrow that down.

That said, I’m still positive. We’re in the early stages of AI coding assistants and not only are some of the LLM’s getting smarter, but many can be tweaked to generate more correct code, and hopefully soon less of it.