Skip to content

SourasishBasu/File-Wizard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

banner

File Wizard

An open-source file conversion webapp built with NextJs, Python
and AWS for the HTTP API, Lambda functions and S3 object storage.
Converts .docx files to .pdf

Features · Running locally · Overview · API Routing · Authors

Features

  • Website

    • NextJs App Router
    • Amazon Web Services for backend functionality
    • Support for HTTP API, S3 File Storage, and Lambda functions
    • Edge runtime-ready
  • AWS Infrastructure

Tech Stack

NextJs Python EC2 API Gateway S3 Lambda

Overview

AWS Architecture

  • A static site is hosted on S3 with a document upload form. We use API Gateway to create an API which makes a GET request to a Lambda function after the user clicks Upload File on the form.

  • The API sends a presigned bucket URL for the uploads-bucket. The site then automatically conducts a PUT request to the same bucket with the .docx file data.

  • Another Lambda function is configured to listen for PUT Object events in the S3 uploads-bucket. It parses the event record for file name and sends a POST request to the Python Flask App performing the document conversion.

  • An EC2 instance is deployed with an Ubuntu OS image. A python script is setup to run as a background process.

  • The python microservice converts documents using pandoc package and is exposed as an API using Flask listening for POST requests on a specified port.

  • It downloads and saves the specified file with its ID, uploads the converted file to the output-bucket on S3. The static site returns the download link for the converted file from the output-bucket.

Configuring application on AWS

S3 Configuration (Only for using the static site as frontend)

The frontend of the app is hosted as a Static site in a separate S3 bucket.

Note

To learn more about the S3 static site and how to deploy it, visit the frontend/README.md

API Routing

The HTTP API is hosted on AWS using API Gateway and Lambda function which deploys a getPresignedURL.js app. Source code for lambda function is in the lambda/presignedURL.js

Note

To learn more about the getPresignedURL.js app and how to deploy it, visit the lambda/README.md

Setup Flask Microservice in EC2 for PDF conversion

  1. Create a EC2 t2.micro instance with an Ubuntu Linux AMI and note the VM's public IPv4 address.

  2. Assign an IAM role to the EC2 instance with the AmazonS3FullAccess policy attached.

  3. Run the Flask development server within the VM:

Installation

Before installing ensure its the correct Python version via python -V

sudo apt update && apt upgrade
sudo apt install pandoc texlive python3.10-venv

Setup Python venv and script

python3 -m venv venv
source venv/bin/activate
pip install pypandoc boto3 flask
mkdir inputs outputs
touch app.py

Copy the contents of app.py within the python file by opening it with any code editor (nano, vim etc).

sudo su
nohup python3 app.py > log.txt 2>&1 &
  • The Flask app should now be able to handle requests 24/7. It is being run as a background process using the nohup command to ensure application uptime as long as VM is running even if we were to exit out of remote shell.
  • The logs and stdout along with stderr is saved to log.txt in the same directory.
  • The & displays the process ID for the python process which may be recorded to perform kill <PID> in case the process is to be stopped.

The Flask app should now be running on: http://{ec2-instance-public-ipv4-address}:5000

Replace this address in the API endpoint URL within the trigger_converter.py Lambda function to send the S3 .docx files to the Flask microservice to be converted.

Warning

This command only starts the webapp. You will need to configure the instance Security Group to allow TCP connections to port 5000 of the EC2 instance from any external IPv4 address [0.0.0.0/0] on AWS to get the full functionality.

Note

Follow the above steps for the PNG and CSV converter microservices in similar fashion in separate directories and expose them on different ports.

Usage

Tip

In case webapp demo videos aren't loading below in the README, please visit Youtube.

site.mp4

DOCX to PDF Conversion


image.mp4

PNG to PDF Conversion



S3 uploads-bucket for .docx files



S3 output-bucket for .pdf files



Flask App process running in EC2

Authors

This project is created by MLSA KIIT for Cloud Computing Domain's Project Wing:

Version

Version Date Comments
1.0 Jan 24th, 2024 Initial release

Future Roadmap

Website/API

  • File Validation and Sanitization on server side
  • Better PDF conversion engine to retain original formatting in higher quality
  • Better Error Handling

AWS Infrastructure

  • Actual implementation in production
  • Conversion feature between multiple file types
  • Implementing image compression using methods such as Huffman Encoding