맥(Mac)에 VLLM 설치하기

Table of Contents

1 Brew 설치
2 Python3.12 설치
3 VirtualEnv 생성
4 VLLM 설치
5 테스트
6 문제해결
7 Api 서버

맥(Mac) 에 VLLM 을 설치에 대한 문서 입니다. 보통 VLLM 은 Nvidia GPU 를 가진 시스템에서 주로 설치하지만 맥에서도 사용할 수 있다. 이때 핵심은 CPU 기반 LLM 이 되도록 설치하는 것이다.

Brew 설치

맥(Mac) 에 Brew 를 설치해준다. 이 brew 는 리눅스에 apt, dnf 와 같은 기능을 한다. 맥으로 포팅된 각종 프로그램들을 brew 명령어를 이용해서 설치할 수 있다. 설치만 해주는게 아니라 프로그램 업데이트 추적, 패키지 삭제 등도 함께 제공한다.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

1	/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

brew 를 이용해서 Python3.12 를 설치해 준다.

Python3.12 설치

brew 를 이용해서 python3.12 를 설치 해준다.

/opt/homebrew/bin/brew install python@3.12
==> Fetching downloads for: python@3.12
==> Downloading https://ghcr.io/v2/homebrew/core/python/3.12/manifests/3.12.12
#################################################################################################################################################################################### 100.0%
==> Fetching python@3.12
==> Downloading https://ghcr.io/v2/homebrew/core/python/3.12/blobs/sha256:a000fd4856f62bf895f00834763827b03270f1c9fdcb27326a32edac66b7fe05
#################################################################################################################################################################################### 100.0%
==> Pouring python@3.12--3.12.12.arm64_tahoe.bottle.tar.gz
==> Caveats
Python is installed as
  /opt/homebrew/bin/python3.12

Unversioned and major-versioned symlinks `python`, `python3`, `python-config`, `python3-config`, `pip`, `pip3`, etc. pointing to
`python3.12`, `python3.12-config`, `pip3.12` etc., respectively, are installed into
  /opt/homebrew/opt/python@3.12/libexec/bin

If you do not need a specific version of Python, and always want Homebrew's `python3` in your PATH:
  brew install python3

`idle3.12` requires tkinter, which is available separately:
  brew install python-tk@3.12

See: https://docs.brew.sh/Homebrew-and-Python
==> Summary
🍺  /opt/homebrew/Cellar/python@3.12/3.12.12: 3,611 files, 66.8MB
==> Running `brew cleanup python@3.12`...

/opt/homebrew/bin/brew install python@3.12

==> Fetching downloads for: python@3.12

==> Downloading https://ghcr.io/v2/homebrew/core/python/3.12/manifests/3.12.12

#################################################################################################################################################################################### 100.0%

==> Fetching python@3.12

==> Downloading https://ghcr.io/v2/homebrew/core/python/3.12/blobs/sha256:a000fd4856f62bf895f00834763827b03270f1c9fdcb27326a32edac66b7fe05

#################################################################################################################################################################################### 100.0%

==> Pouring python@3.12--3.12.12.arm64_tahoe.bottle.tar.gz

==> Caveats

Python is installed as

/opt/homebrew/bin/python3.12

Unversioned and major-versioned symlinks `python`, `python3`, `python-config`, `python3-config`, `pip`, `pip3`, etc. pointing to

`python3.12`, `python3.12-config`, `pip3.12` etc., respectively, are installed into

/opt/homebrew/opt/python@3.12/libexec/bin

If you do not need a specific version of Python, and always want Homebrew's `python3` in your PATH:

brew install python3

`idle3.12` requires tkinter, which is available separately:

brew install python-tk@3.12

See: https://docs.brew.sh/Homebrew-and-Python

==> Summary

🍺 /opt/homebrew/Cellar/python@3.12/3.12.12: 3,611 files, 66.8MB

==> Running `brew cleanup python@3.12`...

설치가 완료 된다.

VirtualEnv 생성

Python3 은 가상 디렉토리 환경을 제공한다. 다음과 같이 VLLM 을 설치하기 위한 Python 가상 디렉토리 환경을 생성한다.

$ pwd
/Users/systemv
$ /opt/homebrew/bin/python3.12 -m venv python3.12

$ pwd

/Users/systemv

$ /opt/homebrew/bin/python3.12 -m venv python3.12

가상 디렉토리 환경을 활성화 해줍니다.

$ source python3.12/bin/activate

1	$ source python3.12/bin/activate

VLLM 설치

VLLM 설치를 위해 소스를 다운로드 한다.

$ pwd
/Users/systemv/python3.12
$ git clone https://github.com/vllm-project/vllm.git

$ pwd

/Users/systemv/python3.12

$ git clone https://github.com/vllm-project/vllm.git

VLLM 소스가 성공적으로 다운로드 되었다면 이제 Python 설치를 해줘야 한다.

$ pip install -r requirements/cpu.txt  # 의존성 패키지 설치
$ export VLLM_TARGET_DEVICE=cpu
$ export VLLM_BUILD_WITH_CUDA=0
$ pip install -e .
Building wheels for collected packages: vllm
  Building editable for vllm (pyproject.toml) ... done
  Created wheel for vllm: filename=vllm-0.11.1rc2.dev191+g80e945298-0.editable-cp312-cp312-macosx_26_0_arm64.whl size=14144 sha256=bc26664471cc7220ad02b629c4db2060b4e16cf5b150f65203b8f2f649a6b0cb
  Stored in directory: /private/var/folders/zl/btg3q3c167d57jph52vg688c0000gn/T/pip-ephem-wheel-cache-02cmvmxd/wheels/ac/31/60/d6c757e5b6aacd66f820349f9c37c76eec071553670e52274f
Successfully built vllm
Installing collected packages: vllm
Successfully installed vllm-0.11.1rc2.dev191+g80e945298

$ pip install -r requirements/cpu.txt # 의존성 패키지 설치

$ export VLLM_TARGET_DEVICE=cpu

$ export VLLM_BUILD_WITH_CUDA=0

$ pip install -e .

Building wheels for collected packages: vllm

Building editable for vllm (pyproject.toml) ... done

Created wheel for vllm: filename=vllm-0.11.1rc2.dev191+g80e945298-0.editable-cp312-cp312-macosx_26_0_arm64.whl size=14144 sha256=bc26664471cc7220ad02b629c4db2060b4e16cf5b150f65203b8f2f649a6b0cb

Stored in directory: /private/var/folders/zl/btg3q3c167d57jph52vg688c0000gn/T/pip-ephem-wheel-cache-02cmvmxd/wheels/ac/31/60/d6c757e5b6aacd66f820349f9c37c76eec071553670e52274f

Successfully built vllm

Installing collected packages: vllm

Successfully installed vllm-0.11.1rc2.dev191+g80e945298

VLLM 설치가 잘 되었는지 다음과 같이 확인 한다.

$ vllm --version
INFO 10-21 19:04:17 [__init__.py:225] Automatically detected platform cpu.
INFO 10-21 19:04:17 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
0.11.1rc2.dev191+g80e945298
$ which vllm
/Users/systemv/python3.12/bin/vllm

$ vllm --version

INFO 10-21 19:04:17 [__init__.py:225] Automatically detected platform cpu.

INFO 10-21 19:04:17 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.

0.11.1rc2.dev191+g80e945298

$ which vllm

/Users/systemv/python3.12/bin/vllm

테스트

이제 다음의 코드를 이용해 테스트를 해본다.

from vllm import LLM, SamplingParams

# 추론할 프롬프트 리스트를 정의합니다.
prompts = [
    "Hello, my name is",
    "The future of AI is",
    "It's a beautiful day, isn't it?",
]

# 샘플링 파라미터를 정의합니다.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=100)

# 사용할 모델을 지정하여 LLM 객체를 생성합니다.
# 처음 실행할 때 모델을 다운로드하므로 시간이 다소 걸릴 수 있습니다.
llm = LLM(model="facebook/opt-125m")

# 프롬프트에 대한 텍스트를 생성합니다.
outputs = llm.generate(prompts, sampling_params)

# 결과를 출력합니다.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

from vllm import LLM, SamplingParams

# 추론할 프롬프트 리스트를 정의합니다.

prompts = [

"Hello, my name is",

"The future of AI is",

"It's a beautiful day, isn't it?",

]

# 샘플링 파라미터를 정의합니다.

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=100)

# 사용할 모델을 지정하여 LLM 객체를 생성합니다.

# 처음 실행할 때 모델을 다운로드하므로 시간이 다소 걸릴 수 있습니다.

llm = LLM(model="facebook/opt-125m")

# 프롬프트에 대한 텍스트를 생성합니다.

outputs = llm.generate(prompts, sampling_params)

# 결과를 출력합니다.

for output in outputs:

prompt = output.prompt

generated_text = output.outputs[0].text

print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

실행을 했는데, 다음과 같이 오류가 발생 했다.

$ python3.12 vllm_test.py                                                                                             ✔  python3.12   at 19:33:02 
INFO 10-21 19:33:59 [__init__.py:225] Automatically detected platform cpu.
INFO 10-21 19:33:59 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 10-21 19:34:00 [utils.py:243] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 10-21 19:34:01 [model.py:663] Resolved architecture: OPTForCausalLM
INFO 10-21 19:34:01 [model.py:1751] Using max model len 2048
INFO 10-21 19:34:02 [arg_utils.py:1314] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
INFO 10-21 19:34:02 [arg_utils.py:1320] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
INFO 10-21 19:34:03 [__init__.py:225] Automatically detected platform cpu.
INFO 10-21 19:34:03 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 10-21 19:34:04 [utils.py:243] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 10-21 19:34:05 [model.py:663] Resolved architecture: OPTForCausalLM
INFO 10-21 19:34:05 [model.py:1751] Using max model len 2048
INFO 10-21 19:34:06 [arg_utils.py:1314] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
INFO 10-21 19:34:06 [arg_utils.py:1320] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 287, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/systemv/python3.12/vllm_examples/vllm_test.py", line 15, in <module>
    llm = LLM(model="facebook/opt-125m")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/entrypoints/llm.py", line 324, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 188, in from_engine_args
    return cls(
           ^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 122, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 93, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 639, in __init__
    super().__init__(
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 468, in __init__
    with launch_core_engines(vllm_config, executor_class, log_stats) as (
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 862, in launch_core_engines
    local_engine_manager = CoreEngineProcManager(
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 142, in __init__
    proc.start()
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html

Traceback (most recent call last):
  File "/Users/systemv/python3.12/vllm_examples/vllm_test.py", line 15, in <module>
    llm = LLM(model="facebook/opt-125m")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/entrypoints/llm.py", line 324, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 188, in from_engine_args
    return cls(
           ^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 122, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 93, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 639, in __init__
    super().__init__(
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 468, in __init__
    with launch_core_engines(vllm_config, executor_class, log_stats) as (
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 880, in launch_core_engines
    wait_for_engine_startup(
  File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 937, in wait_for_engine_startup
    raise RuntimeError(
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

$ python3.12 vllm_test.py  ✔  python3.12   at 19:33:02 

INFO 10-21 19:33:59 [__init__.py:225] Automatically detected platform cpu.

INFO 10-21 19:33:59 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.

INFO 10-21 19:34:00 [utils.py:243] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}

INFO 10-21 19:34:01 [model.py:663] Resolved architecture: OPTForCausalLM

INFO 10-21 19:34:01 [model.py:1751] Using max model len 2048

INFO 10-21 19:34:02 [arg_utils.py:1314] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.

INFO 10-21 19:34:02 [arg_utils.py:1320] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.

INFO 10-21 19:34:03 [__init__.py:225] Automatically detected platform cpu.

INFO 10-21 19:34:03 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.

INFO 10-21 19:34:04 [utils.py:243] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}

INFO 10-21 19:34:05 [model.py:663] Resolved architecture: OPTForCausalLM

INFO 10-21 19:34:05 [model.py:1751] Using max model len 2048

INFO 10-21 19:34:06 [arg_utils.py:1314] Chunked prefill is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.

INFO 10-21 19:34:06 [arg_utils.py:1320] Prefix caching is not supported for ARM and POWER, S390X and RISC-V CPUs; disabling it for V1 backend.

Traceback (most recent call last):

File "<string>", line 1, in <module>

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main

exitcode = _main(fd, parent_sentinel)

^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 131, in _main

prepare(preparation_data)

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare

_fixup_main_from_path(data['init_main_from_path'])

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path

main_content = runpy.run_path(main_path,

^^^^^^^^^^^^^^^^^^^^^^^^^

File "<frozen runpy>", line 287, in run_path

File "<frozen runpy>", line 98, in _run_module_code

File "<frozen runpy>", line 88, in _run_code

File "/Users/systemv/python3.12/vllm_examples/vllm_test.py", line 15, in <module>

llm = LLM(model="facebook/opt-125m")

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/entrypoints/llm.py", line 324, in __init__

self.llm_engine = LLMEngine.from_engine_args(

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 188, in from_engine_args

return cls(

^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 122, in __init__

self.engine_core = EngineCoreClient.make_client(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 93, in make_client

return SyncMPClient(vllm_config, executor_class, log_stats)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 639, in __init__

super().__init__(

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 468, in __init__

with launch_core_engines(vllm_config, executor_class, log_stats) as (

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 137, in __enter__

return next(self.gen)

^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 862, in launch_core_engines

local_engine_manager = CoreEngineProcManager(

^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 142, in __init__

proc.start()

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/process.py", line 121, in start

self._popen = self._Popen(self)

^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/context.py", line 289, in _Popen

return Popen(process_obj)

^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__

super().__init__(process_obj)

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__

self._launch(process_obj)

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 42, in _launch

prep_data = spawn.get_preparation_data(process_obj._name)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data

_check_not_importing_main()

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main

raise RuntimeError('''

RuntimeError:

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module:

if __name__ == '__main__':

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

To fix this issue, refer to the "Safe importing of main module"

section in https://docs.python.org/3/library/multiprocessing.html

Traceback (most recent call last):

File "/Users/systemv/python3.12/vllm_examples/vllm_test.py", line 15, in <module>

llm = LLM(model="facebook/opt-125m")

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/entrypoints/llm.py", line 324, in __init__

self.llm_engine = LLMEngine.from_engine_args(

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 188, in from_engine_args

return cls(

^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/llm_engine.py", line 122, in __init__

self.engine_core = EngineCoreClient.make_client(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 93, in make_client

return SyncMPClient(vllm_config, executor_class, log_stats)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 639, in __init__

super().__init__(

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/core_client.py", line 468, in __init__

with launch_core_engines(vllm_config, executor_class, log_stats) as (

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/Cellar/python@3.12/3.12.12/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 144, in __exit__

next(self.gen)

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 880, in launch_core_engines

wait_for_engine_startup(

File "/Users/systemv/python3.12/vllm/vllm/v1/engine/utils.py", line 937, in wait_for_engine_startup

raise RuntimeError(

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

잘 안된다.

문제해결

이 문제는 python 의 main() 함수를 이용해야하는 문제다. 다음과 같이 소스코드를 수정 한다.

from vllm import LLM, SamplingParams

# 추론할 프롬프트 리스트를 정의합니다.
prompts = [
    "Hello, my name is",
    "The future of AI is",
    "It's a beautiful day, isn't it?",
]

# 샘플링 파라미터를 정의합니다.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=100)

def main():
    # 사용할 모델을 지정하여 LLM 객체를 생성합니다.
    # 처음 실행할 때 모델을 다운로드하므로 시간이 다소 걸릴 수 있습니다.
    llm = LLM(
        model="facebook/opt-125m",
        tensor_parallel_size=1,
        enable_prefix_caching=False,
        trust_remote_code=True,
        gpu_memory_utilization=0.75,
        max_model_len=2048
    )

    outputs = llm.generate(prompts, sampling_params)

    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if __name__ == "__main__":
    main()

from vllm import LLM, SamplingParams

# 추론할 프롬프트 리스트를 정의합니다.

prompts = [

"Hello, my name is",

"The future of AI is",

"It's a beautiful day, isn't it?",

]

# 샘플링 파라미터를 정의합니다.

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=100)

def main():

# 사용할 모델을 지정하여 LLM 객체를 생성합니다.

# 처음 실행할 때 모델을 다운로드하므로 시간이 다소 걸릴 수 있습니다.

llm = LLM(

model="facebook/opt-125m",

tensor_parallel_size=1,

enable_prefix_caching=False,

trust_remote_code=True,

gpu_memory_utilization=0.75,

max_model_len=2048

)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:

prompt = output.prompt

generated_text = output.outputs[0].text

print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if __name__ == "__main__":

main()

코드를 위와같이 변경하고 난 후에 실행하면 잘 된다.

Api 서버

VLLM 은 API Server 로 서빙할 수 있다. Client 는 JSON 포맷으로 값을 넣으면 리턴을 해주는 방식이다. Api Server 구동을 위한 명령어는 다음과 같다.

#!/bin/zsh

export VLLM_USE_CUDA=0
python3.12 -m vllm.entrypoints.api_server \
    --port 8000 \
    --model facebook/opt-125m \
    --tensor_parallel_size 1 \
    --max-model-len 2048 \
    --enable-prefix-caching \
    --trust-remote-code \
    --gpu-memory-utilization 0.75

#!/bin/zsh

export VLLM_USE_CUDA=0

python3.12 -m vllm.entrypoints.api_server \

--port 8000 \

--model facebook/opt-125m \

--tensor_parallel_size 1 \

--max-model-len 2048 \

--enable-prefix-caching \

--trust-remote-code \

--gpu-memory-utilization 0.75

Voyager of Linux