AI Agent; The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

21 Mar 2025 in Llm

The AI Scientist: Towards Fully Automated

abstract

완전 자동화되고 확장 가능한 최초의 논문 생성 프레임워크 (연구수행)
광범위한 연구 방향과 간단한 초기 코드베이스를 제공하면,

generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation

실제로 diffusion modeling, transformer-based language modeling, learning dynamics에 적용해보았을 때 각 논문이 $15 이하의 비용으로(..?) 완성됨
평가를 위해 자동 리뷰어를 설계하고 검증하였으며, 자동 리뷰어의 평가에 따르면 AI Scientist가 생성한 논문은 주요 머신러닝 학회에서의 논문 심사 기준을 충족하였다.

framework

벤치마크에서 간단한 기초 훈련 과정을 재현하는 경량 코드 템플릿이 주어지면, 아이디어/실험/논문 작성까지 자동화하는 framework

AI Scientist의 LLM 기반 end-to-end process

아이디어 생성 및 평가
가설 검증 방법 설계
자동 실험 실행 및 데이터 수집
연구 결과 정리 및 논문 작성

idea generation

기초 템플릿 제공

Interestingness 흥미로움
Feasibility: 실현 가능성
Novelty: 기존에 있는 연구가 아닌지, 참신성
Novel에서 true가 나올 경우에만 실험 진행

experiment iteration

Aider; coding assistant
최대 4번까지 iteration
Iteration 동안 자체적으로 수정을 거치며
초록 코드가 추가한 코드, 빨간 코드가 삭제한 코드

paper write-up

예시 1) https://github.com/SakanaAI/AI-Scientist/blob/main/example_papers/adaptive_dual_scale_denoising.pdf

Per-Section Text Generation
Web Search for References
Refinement
Compilation

생각보다 논문의 형태는 갖춰져 있다.

automated paper reviewing

Each Review is generated for $0.25 ~ $0.50

Self-Reflection 5회 Iteration
5 ensembled reviews
1-shot review example from ICLR 2022 review guidelines

LLM기반 리뷰가 가장 성능이 좋았고,
GPT-4o-mini와 Claude Sonnet 3.5 는 비용 효율적이었으나 성능이 매우 떨어짐

70%의 정확도 달성
자동화된 LLM 리뷰어가 사람 리뷰어보다 높은 정확도를 보이기도 했다.

limitation

잘못된 아이디어 생성 및 비합리적인 추론

Vision task를 수행하지 못함
- 템플릿에 vision task 관련 내용이 제공된다면 읽을 수 없다. Multimodal을 통해 해결할 수 있을 것.
Hallucination of experimental details
비전문적인 표현
Positive Interpretation of Results
- ex. KL Divergence는 낮을수록 좋은 성능을 뜻하는데 단순히 값이 높아졌다는 이유로 ”향상되었음”이라고 해석 (ㅋㅋㅋ)
Minimal References
- 기존 연구를 참고를 덜한 만큼 논리적 비약의 가능성이 높음

AI가 논문을 쓰는 날이 오는구나.. 😰 완성도는 낮지만 연구의 프로세스를 인간처럼 수행할 수 있다는 점에서 의의가 있다고 본다.

AI Agent; The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

abstract

framework

idea generation

experiment iteration

paper write-up

automated paper reviewing

limitation

ese2o

Error

abstract

framework

idea generation

experiment iteration

paper write-up

automated paper reviewing

limitation

Templates (for web app):

Error