serving

Here are 104 public repositories matching this topic...

torchpipe / torchpipe

Bridge the gap between deep learning training and serving

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server ray-serve cvcuda

Updated Jul 3, 2024
C++

ray-project / ray

Star

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jul 3, 2024
Python

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated Jul 3, 2024
Java

pytorch / serve

Star

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jul 3, 2024
Java

deepjavalibrary / djl-serving

Star

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Jul 3, 2024
Java

Lightning-AI / LitServe

Star

High-throughput serving engine for AI models, with a friendly interface and enterprise scale.

api ai serving

Updated Jul 2, 2024
Python

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated Jul 2, 2024
Java

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jul 2, 2024
C++

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jul 2, 2024
C++

tensorflow / serving

Star

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated Jul 2, 2024
C++

SeldonIO / seldon-core

Star

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated Jul 2, 2024
HTML

polyaxon / haupt

Star

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Updated Jun 24, 2024
Python

intel / intel-ai-inference-samples

Star

Intel® AI Inference Samples provide example code for deploying optimized inference in Intel platforms.

sample ai intel inference bert serving ipex openvino

Updated Jun 24, 2024
Python

friendliai / friendli-client

Star

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 19, 2024
Python

aws-samples / amazon-sagemaker-model-serving-using-aws-cdk

Star

This repository provides AI/ML service(MachineLearning model serving) modernization solution using Amazon SageMaker, AWS CDK, and AWS Serverless services.

serverless cdk serving sagemaker mlops