information-retrieval

Here are 2,417 public repositories matching this topic...

AnonCatalyst / WebDiver

WebDiver is a versatile Python script for crawling websites, extracting internal and external links, titles, and descriptions. It's useful for tasks such as web analysis, OSINT (Open Source Intelligence) gathering, and competitive analysis.

information-retrieval osint python3 information-extraction information-technology webcrawler webscraping cyber-security information-gathering webcrawling osinttool osint-python osint-tool osint-tools webcrawlers osint-toolkit

Updated Jul 3, 2024
Python

apache / lucene

Star

Apache Lucene open-source search software

search java search-engine information-retrieval backend nosql lucene

Updated Jul 3, 2024
Java

infiniflow / ragflow

Star

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning chatbot orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated Jul 3, 2024
Python

LongxingTan / open-retrievals

Star

All-in-One: Text Embedding, Retrieval, Reranking and RAG

nlp information-retrieval retrieval indexing embeddings ranking semantic-search triplet-loss dense rag contrastive-learning dense-retrieval tranformers llm retrieval-augmented-generation rerank

Updated Jul 3, 2024
Python

danswer-ai / danswer

Star

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

python information-retrieval nextjs enterprise-search rag ai-chat chatgpt gen-ai

Updated Jul 3, 2024
Python

FlagOpen / FlagEmbedding

Star

Retrieval and Retrieval-augmented LLMs

information-retrieval embeddings sentence-embeddings text-semantic-similarity llm retrieval-augmented-generation

Updated Jul 3, 2024
Python

aryn-ai / sycamore

Star

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

search nlp information-retrieval ai etl ml semantic-search opensearch dataprep llm

Updated Jul 3, 2024
Python

rapidsai / raft

Star

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Updated Jul 3, 2024
Cuda

Unstructured-IO / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Jul 2, 2024
HTML

felladrin / MiniSearch

Star

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

search nlp search-engine machine-learning information-retrieval typescript ai artificial-intelligence webapp question-answering searxng llm gpu-accelerated generative-ai llm-inference retrieval-augmented-generation web-llm ratchet-ml wllama

Updated Jul 2, 2024
TypeScript

apache / solr

Star

Apache Solr open-source search software

search java search-engine information-retrieval backend nosql solr lucene

Updated Jul 2, 2024
Java

weaviate / weaviate

Star

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Updated Jul 2, 2024
Go

deepset-ai / haystack

Star

🔍 LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.