Skip to content

A simple crawler to calculate disk usage of a directory.

Notifications You must be signed in to change notification settings

dheshanm/diskusage-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiskUsage - Rust

A simple crawler to calculate disk usage of a root directory.

This crawler is written in Rust and uses the walkdir crate to traverse the directory tree. It uses rayon to parallelize the traversal. Each file and directory's metadata is read using the std::fs module (Unix specific). This information is then written to a postgres database using sqlx.

Features

  • Crawls the directory tree and calculates the disk usage of each file.
  • Uses rayon to parallelize the traversal.
  • Uses sqlx to write the data to a postgres database.
  • Estimates the disk usage of a folder, using a recursive query in the database.

Use Cases

  • Calculate the disk usage of an especially large directory.
    • If looking for something more lightweight, consider using the du command or parallel-disk-usage instead.
  • See the owner of the files and directories.
    • Most other programs do only look at the size of the files and directories.

Prerequisites

  • An configured and accessible postgres database.

Limitations

  • The crawler is Unix specific and uses the std::os::unix::fs::MetadataExt module to read file metadata.
  • The sqlx crate requires a valid schema to be present in the database, during compilation. This schema can be generated using the init_db binary.
    • This might require partial compilation of the project, which can be done using the cargo build --bin init_db command.

Usage

  1. Build the project:
cargo run --release -- <root_directory>
  1. Initialize the database:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database>
./target/release/init_db
  1. Run the crawler:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database>
./target/release/disk_usage -r <root_directory>
  1. Get folder size:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database>
./target/release/estimate -p <path>

Database Schema

The database schema is visualized below:

Database Schema

About

A simple crawler to calculate disk usage of a directory.

Topics

Resources

Stars

Watchers

Forks

Languages