LiBook: Book Search Engine 🔍

In this repository, you can find the source code for building up an inverted index based search engine for books obtained from both Project Gutenberg and registered users' accounts directly. We also implemented both relational and non-relational datamarts to be able to make queries on the available books. This is a micro-service-oriented application that consists of the next modules:

Crawler: Obtains books directly from Project Gutemberg book platform and stores them into our datalake.
Cleaner: Processes the books and prepares them to be indexed.
Indexer: Indexes the books into our inverted index structure in Hazelcast.
MetadataDatamartBuilder: Creates a metadata datamart for queries.
QueryEngine: Offers an API for users to be able to query our inverted index.
UserService: Handles users' accounts in MongoDB, and session tokens through a distributed Hazelcast datamart.
UserBookProcessor: Processes the books uploaded by users and sends them to the cleaner.
ApiGateway: Serves an API merging all the public APIs of the final application, improving security on petitions.

Crucially, this project employs three distinct datamart technologies—Hazelcast, MongoDB, and Rqlite. Rqlite, based on SQLite and adapted for clustered usage, is particularly notable for its role in distributed relational database management within the application. The integration of these datamarts enhances the overall scalability, efficiency, and versatility of the search engine, accommodating both centralized and distributed data processing needs.

1) How to run (Docker and Docker Compose)

For each module, you should generate the corresponding docker image. In our case, we will deploy in a Google Cloud virtual machine instance the Crawler, Cleaner, API Gateway, User Service and User Books Processor services. To do so, after connecting to the Google Cloud server, we execute the docker-compose up command, for the docker compose file given in this repository. Now, we have to run the micro-services that are left to the in-premise processing.

Indexer

To execute the indexer, we should run the docker image as follows:

docker run -p 8082:8082
            -e "SERVER_API_URL=http://34.16.163.134"
            -e "SERVER_MQ_PORT=443"
            -e "index=1"
            --network host
ricardocardn/indexer

Make sure you should specify the option --network host, or some problems related to Hazelcast may raise.

Metadata Datamart Builder

To run this service, we will have to run both rqlite and metadata datamart builder images. Thus, execute the rqlite image in one single computer of your cluster:

docker run -p4001:4001 -p4002:4002 rqlite/rqlite

And, for the Metadata Datamart builder, execute:

docker run -e "SERVER_MQ_PORT=443"
            -e "SERVER_API_URL=http://34.16.163.134"
            -e "SERVER_CLEANER_PORT=80"
            -e "LOCAL_MDB_API=http://34.16.163.134"
ricardocardn/metadata-datamart-builder

Query Engine

Query engine will make use of both Hazelcast and Rqlite, so we should run its images as follows:

docker run -p 8080:8080
            --network host
susanasrez/queryengine2

User Service

The user service will be connected to both Hazelcast and MongoDB datamarts, so make sure to use the --network=host and you have a MongoDB Atlas account. So:

 docker run -p 8082:8082
            -e "MONGO_ATLAS_PASSWORD=..."
            -e "SERVER_API_URL=http://34.16.163.134"
            -e "SERVER_BOOKS_PORT=80" ricardocardn/user-service

Local API Gateway

Local API Gateway will be used so that computers in the cluster will be able to deploy several services. Thus, when routing with the load balancer, each user petition can be attended by any computer.

docker run -p 8080:8080
            -e "USER_SERVICE_API=http://localhost:{local user service's port}"
            -e "QUERY_ENGINE_SERVICE_API=http://localhost:{query engine service's port}"
            -e "CLEANER_SERVICE_API=http://{server's ip}:{cleaner's port}"
          ricardocardn/local-api-gateway

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.idea		.idea
ApiGateway		ApiGateway
Cleaner		Cleaner
Crawler		Crawler
DataLoader		DataLoader
Indexer		Indexer
MetadataDatamartBuilder		MetadataDatamartBuilder
QueryEngine		QueryEngine
UserBooksProcessor		UserBooksProcessor
UserService		UserService
resources		resources
.DS_Store		.DS_Store
README.md		README.md
Scientific Paper SearchEngineProject.pdf		Scientific Paper SearchEngineProject.pdf
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiBook: Book Search Engine 🔍

1) How to run (Docker and Docker Compose)

Indexer

Metadata Datamart Builder

Query Engine

User Service

Local API Gateway

Credits

About

Releases

Packages

Contributors 5

Languages

LiBookSearchEngine/LiBook

Folders and files

Latest commit

History

Repository files navigation

LiBook: Book Search Engine 🔍

1) How to run (Docker and Docker Compose)

Indexer

Metadata Datamart Builder

Query Engine

User Service

Local API Gateway

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages