Just opened PR huggingface/text-embeddings-inference#103 to add SageMaker-compatible images to HF TEI, following the pattern established by huggingface/text-generation-inference#147.
Since the required routes were already in place, the implementation focused primarily on CI infrastructure and some creative workarounds:
build-and-push-sagemaker-image steps to build_* workflowssagemaker target to Dockerfile-cuda and a custom sagemaker_entrypoint.shInitial tests suggest it works quite well with text embedding and reranker models (see image below for an example with BAAI/bge-reranker-base). Currently working on a notebook demo and some load/stress tests to compare HF TEI’s performance against similar solutions.
Still under review, so stay tuned!
