RepoAgent - An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

Paper	2402.16667
Conference	EMNLP 2024
Github	GitHub - OpenBMB/RepoAgent: An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.

針對專案進行自動生成文檔，並且可以自動更新，因此不用工程師手動維護，且可以讓新成員透過專案快速理解。

Framework

開發者對 Github Repository 進行 Code Update
RepoAgent 的自動處理
- Change Detector：偵測哪些程式碼有變動
- File Handler：讀取相關檔案
- Bi-directional Reference Retriever：比對「函式誰呼叫誰」
- Parallel Chat Engine：多執行緒處理生成流程
- 其他模組：如多語系處理（Multi-linguistic）、自動提交（Auto-Commit）等
分析結構與人類回饋
- 建立/更新 Repository Structure，包含 code_name, parent, start_line, `end_line
- 結合可能的 Human Feedback (option)，補充功能、參數、邏輯、範例與注意事項
自動產生/更新文件（Markdown）
- 根據變動與結構關係，產生/更新對應的文件區塊 → Markdown
  - 修改函式會更新說明（紅字）
  - 新增函式會自動加入（標記為 New）
  - 舊的未變更項目會保留原樣
成果同步回 GitHub

RepoAgent Framework.png

開發者花了約 58% 的時間在理解程式碼，高品質文件能有效減少這些負擔
但維護文件本身也很花時間、人力與資源

雖然為了減輕人工維護負擔，早期研究曾嘗試 自動產生程式碼摘要，
但這些方法仍存在明顯限制：

⇒ 這些限制讓自動文件生成雖具潛力，但在實務上仍難真正落地。

LLMs 現在在程式理解與生成已大幅進步，那麼：
是否能用 LLM 來自動產生並維護 repository 級別的程式碼文件，解決上述問題？

Cite

Can LLMs be used to generate and maintain repository-level code documentation, addressing the aforementioned limitations?

目標：