Discover an elegant way to deploy small language models (SLMs) or quantized versions of larger models on AWS Lambda using function URLs and response streaming.
📝 Read the full article on AWS Community.
👨💻 All code and documentation is available at github.com/JGalego/SLaMbda.