LLM은 NLP의 하위 기술이면서, GAI의 주요 도구 중 하나
NLP
│
▼
+—————+
| LLM |
+—————+
▲ ▲
│ │
GAI 내부 구조
├── RNN
├── LSTM
└── Transformer
Use them for:
Avoid using them for:
While LLMs are capable of amazing things, not enough people know how to actually turn them into real products.
-> 서점, 도서관, 인프런, 유데미 강좌들을 다 찾아봐도 진짜 없다. 코세라에는 있을 수도 있는데, 자막 지원이 안되니까.. 이 책도 한국어 버전은 예약 판매 중임..
LLMs don’t behave like traditional software — in many cases, it’s necessary to train or fine-tune a model, or alternatively, access one through a vendor’s API.
Key topics include the best tools and infrastructure for working with LLMs, techniques like prompt engineering, and practical considerations such as cost control, scalability, and deployment strategies.
:Working with an API is an incredibly easy and cheap way to build a prototype and get your hands dirty quickly.
import os
import openai
# Fix: 'openal' ➜ 'openai'
openai.api_key = os.getenv("OPENAI_API_KEY")
chat_completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello world"}],
)
# To view the response:
print(chat_completion.choices[0].message['content'])
Not to mention, the LLM itself is only half the problem in deploying them to production. There’s still an entire application you need to build on top of it.
With GPT-3, OpenAI introduced Reinforcement Learning from Human Feedback (RLHF) to improve the model’s performance. However, this also meant that contractors had to review user prompts to provide feedback — essentially putting a "band-aid" on deeper issues instead of addressing core model flaws directly.
While OpenAI allows some level of fine-tuning, many critical aspects remain out of users' control, such as:
If the model becomes a core part of your product, you're vulnerable to vendor decisions — like sudden price increases or changes in model behavior (e.g., political bias) — with little recourse.
Key point: The more essential a technology is to your business, the more important it is to have full control over it, rather than relying on third-party services.
For a Competitive Edge
As of now, a search for "BERT" on the Hugging Face Hub returns over 13,700 models — each fine-tuned by individuals to best fit their specific needs.
What you likely need is a custom language model designed to perform the few tasks that matter most to your business — better than any general-purpose model — and without sharing your data with Microsoft or other potential competitors.
In contrast, relying on a third-party LLM via API introduces integration and latency challenges. You have to send data over the network and wait for a response, which slows things down and can be unreliable. While APIs are convenient, they’re inherently slower and not always dependable. When low latency is crucial, it’s far better to host the service in-house.
As noted in the previous section on Competitive Edge, two projects have already prioritized edge computing—but many more are emerging. Projects like LLAMA.cpp and ALPACA.cpp were among the first, and innovation in this space is accelerating rapidly. Techniques like 4-bit quantization, Low-Rank Adaptation (LoRA), and Parameter-Efficient Fine-Tuning have been developed specifically to meet these demands.
All of these factors should be considered when deciding whether to build or buy an LLM. At first glance, buying may seem cheaper—after all, one of the most popular services on the market today costs only $20 USD per month. But compare that to running the same model for inference (not even training) on an EC2 instance, which could cost around $250,000 USD annually.
If you're just looking to build a proof of concept, the projects mentioned in the Competitive Edge section can help you get a demo running for the cost of electricity on your own machine. Some frameworks even make training affordable—models with up to 20 billion parameters can be trained for as little as $100. And perhaps the biggest advantage: if you build your own model, your costs remain stable—unlike subscription services, which often increase over time.
내용 추가 예정
[참고자료] LLMs in Production(2023)