Skip to content

Commit 39cfe23

Browse files
authored
Put more wiggle room. (#3189)
* Put more wiggle room. * Fixing the makefile by using lockfile. * Pre commit
1 parent 3758029 commit 39cfe23

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

server/Makefile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ include Makefile-flashinfer
1010
unit-tests:
1111
pip install -U pip uv
1212
uv pip install -e ".[dev]"
13+
uv sync --inexact --extra dev --active
1314
pytest -s -vv -m "not private" tests
1415

1516
gen-server:
@@ -30,14 +31,14 @@ gen-server-raw:
3031
touch text_generation_server/pb/__init__.py
3132

3233
install-server: gen-server
33-
uv pip install -e ".[accelerate, compressed-tensors, quantize, peft, outlines]"
34+
uv sync --inexact --extra accelerate --extra compressed-tensors --extra quantize --extra peft --extra outlines --active
3435

3536

3637
install: install-cuda
3738
echo "Installed server"
3839

3940
install-cuda: install-server install-flash-attention-v2-cuda install-flash-attention
40-
uv pip install -e ".[attention,bnb,marlin,moe]"
41+
uv sync --inexact --extra attention --extra bnb --extra marlin --extra moe --active
4142
uv pip install nvidia-nccl-cu12==2.22.3
4243
kernels download .
4344

server/text_generation_server/models/globals.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
raise RuntimeError("Prefix caching is only supported with flashinfer")
2929

3030
MEM_POOL = torch.cuda.graph_pool_handle() if torch.cuda.is_available() else None
31-
TGI_WIGGLE_ROOM = float(os.getenv("TGI_WIGGLE_ROOM", "0.93"))
31+
# Test a 70B model on 4xA100 under load for latest failure
32+
TGI_WIGGLE_ROOM = float(os.getenv("TGI_WIGGLE_ROOM", "0.90"))
3233
assert TGI_WIGGLE_ROOM > 0
3334
assert TGI_WIGGLE_ROOM < 1
3435

0 commit comments

Comments
 (0)