Skip to content

A mini Google. Custom web crawler & indexer written in Golang.

License

Notifications You must be signed in to change notification settings

emarifer/search-engine

Repository files navigation

Search Engine


A mini Google. Custom web crawler & indexer written in Golang.

🚧 This is a work in progress and therefore you should expect that the application may not have all the features at this moment.


GitHub License Static Badge


Features 🚀

  • Golang-Powered: Leverage the performance and safety of one of the best languages in the market for backend development.
  • Search engine based on the Depth-first search (DFS) algorithm: Depth-first search is an algorithm for traversing or searching tree or graph data structures, as is the case with HTML documents. To avoid processing the same link more than once, a unique constraint is used when storing the urls that will be crawled in subsequent cycles.
  • Indexing full text search: It is carried out using a parser/tokenizer that uses the Snowball library and implemented an inverted index, which is stored in the database, allowing an efficient query of the terms search.
  • SQL Database Integration: Storing crawled urls and indexing results in a Postgres DB, which allows greater scalability and efficiency in searches.
  • Caching of the responses (in JSON format) of the searches performed: The Fiber framework provides middleware for easy caching of server responses.
  • Using the Fiber framework, A-H/Templ and Htmx libraries:: The use of Fiber, Templ and Htmx greatly speeds up the creation of a simple user interface for minimal search engine administration. Check out some of my other repositories for more explanations.
  • Using interfaces in the services package: The architecture follows a typical "onion model" where each layer doesn't know about the layer above it, and each layer is responsible for a specific thing, in this case, the services (package) layer, which allows for better separation of responsibilities and dependency injection.
  • Using concurrency in engine-built crawling functions: Use is made of one of the features in which the Go language shines most: concurrency, to try to speed up the always heavy link crawling tasks. 🚧 This is a work in progress!!


🖼️ Screenshots:

Admin login screen and dashboard:

  

Response to a search performed with cURL:

👨‍🚀 Installation and Usage

Before compiling the view templates, you'll need to regenerate the CSS. First, you need to install the dependencies required by Tailwind CSS and daisyUI (you must have Node.js installed on your system) and then run the regeneration of the main.css file. To do this, apply the following commands:

$ cd tailwind && npm i
$ npm run build-css-prod # `npm run watch-css` regenerate the css in watch mode for development

Since we use the PostgreSQL database from a Docker container, it is necessary to have the latter also installed and execute this command in the project folder:

$ docker compose up -d

These other commands will also be useful to manage the database from its container:

$ docker start search-engine # start container
$ docker stop search-engine # stop container
$ docker exec -it search-engine psql -U postgres # (user: postgres, without password)

Besides the obvious prerequisite of having Go! on your machine, you must have Air installed for hot reloading when editing code.

Tip

In order to have autocompletion and syntax highlighting in VS Code for the Templ templating language, you will have to install the templ-vscode extension (for vim/nvim install this plugin). To generate the Go code corresponding to these templates you will have to download this executable binary from Github and place it in the PATH of your system. The command:

$ templ generate # `templ generate --watch` to enable watch mode

Tip

This command allows us to regenerate the .templ templates and, therefore, is necessary to start the application. This will also allow us to monitor changes to the .templ files (if we have the --watch flag activated) and compile them as we save them if we make changes to them. Review the documentation on Templ installation and support for your IDE .

Build for production:

$ go build -ldflags="-s -w" -o ./bin/search-engine ./cmd/search-engine/main.go # ./bin/search-engine to run the application / Ctrl + C to stop the application

Start the app in development mode:

$ air # This compiles the view templates automatically / Ctrl + C to stop the application

Happy coding 😀!!