TY - JOUR
T1 - A3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware
AU - Liao, Dianshu
AU - Pan, Shidong
AU - Sun, Xiaoyu
AU - Ren, Xiaoxue
AU - Huang, Qing
AU - Xing, Zhenchang
AU - Jin, Huan
AU - Li, Qinying
N1 - Publisher Copyright:
© 1976-2012 IEEE.
PY - 2024/12
Y1 - 2024/12
N2 - LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.
AB - LLM-based code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three types of representative information for the code repository: local-aware information from the current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.
KW - Code generation
KW - code reuse
KW - repository knowledge mining
KW - retrieval-augmented generation
UR - http://www.scopus.com/inward/record.url?scp=85207889505&partnerID=8YFLogxK
U2 - 10.1109/TSE.2024.3486195
DO - 10.1109/TSE.2024.3486195
M3 - Article
AN - SCOPUS:85207889505
SN - 0098-5589
VL - 50
SP - 3369
EP - 3384
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 12
ER -