Refactoring vs Refuctoring Whitepaper

CodeScene’s latest whitepaper, Refactoring vs Refuctoring, has some interesting findings, quotes, and improvements for AI / LLM assisted coding.

Findings:

  1. LLM’s refactorings only correct 37% of the time
  2. Using their fact checking, it can be increased to 98%

I’ve seen the tech press focus either #1 only to imply “see, told you LLM’s are harmful” or #2 only to imply “see, LMM’s are the best, you’re missing out”.

However, I’m intrigued by finding #3 which is hinted it a few times, but not explicitly called out:

3. LLM’s are good for writing new code and helping you with idea’s.

When writing documentation, or a new test/new code, I specifically would turn Copilot off in VSCode because it was either way off base, or wrote documentation in some strange style; both were annoying. I gave up asking ChatGPT for help after 3 tries, even after learning a few prompt variations; the code was so bad and didn’t work, or just way off base. I’m unclear what new code they, the researchers, think these LMM’s are good at.

Quotes:

  • GPT 4 did give better results, but was slower and more expensive, negating the benefits
  • regarding Copilot and CodeWhisperer “It’s safe to assume that a human developer shipping code which breaks 60-80% of the time would be asked to look for new challenges. Promptly.”
  • “AI-assistant like Copilot or CodeWhisperer can be useful as the starting point for new code, serving as an inspiration and a coach” <– I have had the opposite reaction.
  • “CodeScene’s fact-checking model is able to validate the proposed code changes, and reject 98% of the incorrect refactorings.” <– pretty amazing
  • Their summary is compelling, mainly that aiming this at fighting technical debt has pays large dividends, not just in money but everyone happiness, so automating this is pretty compelling.

Your Good Code vs My Good Code

Their secret sauce has a few positive caveats to ’em. Their CodeScene tool which defines “good code” immediately sounds like ESLint. The problem with ESLint is that the “rules” are “a particular developer’s preference”. And this is why code is Art, not Engineering; we all have our own preferences and interpretations.

However, teams _do_ end up convening on a common set of rules. That means the LLM could be coerced, configured, or even trained on “our version of good code” so uniquely useful to the team.