![](https://velog.velcdn.com/images/0404_not_found/post/5d2ff526-a698-4ae7-a8ad-af1bf1364133/image.png)
1. Introduction
-
LLMs became very powerful and used in lots of fields
-
Due to Llama 2 and 3, the open-source LLMs has seen significant growth
- user may select the optimal model based on the use case
-
Graph data structure
- can be used to represent the relationships between models, the optimal use cases and their capabilities
- create a powerful framework for seamless model integration, intelligent query routing and optimized perofrmance
-
on-device AI models
AI agents with functional tokens
-
functional tokens can select suitable models or functions
-
make synergy with Octopus framework
-
selects the best neighbor, restructures the information and transmits optimized information
Multi-Agent LLMs
-
harness collective intelligence from specialized agents
-
integration difficulties, data sharing issues and maintaining smooth coordination between agents
-
exploring possibilities like cross-domain expertise and real-time collaboration
-
parallel function calling → self-connections
-
sequential action processing → graph traversal
LLM Scaling law
- leverating distributed computing and node expansion to addresses the scalability issues → nearly unlimited node scalability
3. Methodology
3.1 LM for classification from Octopus v2
- functional token in Octopus v2
- f for the choice from the set F, params for the reformulated information derived from the query q
P(f,params ∣ q)
- used in selecting the optimal choice, reformulating the query to transmit
- select the best neighboring nodes, pass the information to subsequent nodes
3.2 LMs as nodes in graph
- directed and heterogeneous graph G=(N,E)
- master nodes Nm : coordinate queries by directing to worker nodes
- worker nodes Nw : transfer necessary information for task
- master node passes the information and worker nodes handle
![](https://velog.velcdn.com/images/0404_not_found/post/4905f222-b933-47b1-bdc0-927749315c69/image.png)
- user queries q and responses y
- P(y ∣ q)=P(y ∣ q;G)
- single-step task involves only one worker node
- P(y ∣ q;G)=P(Nw,qh ∣ q;Nm)P(y ∣ qh;Nw)
- second term is from Octopus v2
- uses Octopus v2 to select the best neighboring worker Nw and reformat the query to qh
- third term is to calculating the result by worker
- Multi-step task involves several sequential interactions
- simply expands the formula
P(y ∣ q;G)=i=0∏k−1P(Niw,qhi ∣ q;Nim)P(y ∣ qhi;Niw)
-
to answer one query from the user, only activating two small models is needed
-
use functional token to get rid of RAG
3.3 Task planning using graphs for multistep operations
- traditional approach
-
all available functions are listed
-
LLM generated the plan with the user query and the list
-
small model cannot grash the extensive descriptions effectively
-
it doesn't consider the inherent relevance among function descriptions
→ using Graph
- Graph-based approach
- only neighboring nodes are considered
- reducing the complexity
- using Octopus v2
-
enabling rapid query redirection and reformatting
-
apply the functional token to make it as a single AI agent which can take single function callings for each LMs
-
or the single noce can be an ordinary LM (Llama3, Phi3)
-
At thi another layer, user Octopus v3 to choose from the nodes
![](https://velog.velcdn.com/images/0404_not_found/post/879fdd22-2f29-4c66-babd-e8ce158b8745/image.png)
3.4 Functional token and dataset collections
-
conceptualize each model as a distinct function
-
for specific models, detail the required prompt template in the function's doc string
![](https://velog.velcdn.com/images/0404_not_found/post/282db3fa-d0ab-47a3-b1e3-f2d612672106/image.png)
![](https://velog.velcdn.com/images/0404_not_found/post/1989489c-416c-4b12-9188-90f1257979cf/image.png)
- construct the dataset using similar strategy to Octopus v2
- synthetic data to train the functional tokens
- increase the temperature to accommodate diverse queries
3.5 System design of LM graph
- Worker node deployment
- Nw as an individual LM
- serverless architecture
- limit the worker size to 10B
- Master node deployment
- base model with fewer than 10B
- compact LoRA can be integrated to extend functional token capabilities
- single base model with multiple LoRA, one per each worker
- LoraX library
- Communication
-
worker and master is distributed acrosso various devices
-
internet connectivity is essential
-
master → on-device, worker → cloud
![](https://velog.velcdn.com/images/0404_not_found/post/c2b204dc-6c5f-4922-ac27-e88a44ceffe5/image.png)
4. Experiments
4.1 Task and models
- MMLU with 17 distinct models
![](https://velog.velcdn.com/images/0404_not_found/post/68b37b78-ef0d-4650-b181-5b77dd2047ca/image.png)
-
Specialized models from HF based on benchmark, popularity and endorsements
-
Not all tasks have specialized model → used Llama 3 with system prompt is used instead of the specialized model (Humanities task)
4.2 MMLU evaluation
![](https://velog.velcdn.com/images/0404_not_found/post/959aa892-efad-4282-8a44-3d4e3666d0f6/image.png)
5. Discussion and Future works
5.1 How to train a vertical model
- fine-tune with domain-specific expertise
- gather substancial corpus
- ensure the data is diverse, well-organized, embodies the knowledge
- clean the data
- Use HF SFT Trainer
5.2 Future work
주어진 Query를 보고 유사도에 기반해 다음 행동을 정하는 RAG가 아니라, 애초에 학습 과정에서 토큰에 값을 붙여주면 더 빠르게 행동을 선택할 수 있다는 아이디어. Agent를 활용할 때 도움이 될듯