Skip to content

PPSNet: Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos (ECCV, 2024)

Notifications You must be signed in to change notification settings


Repository files navigation

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

🔥 Please remember to ⭐ this repo if you find it useful and cite our work if you end up using it in your work! 🔥

🔥 If you have any questions or concerns, please create an issue 📝! 🔥

Pre-print | Project Website

📖 Abstract

Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page:

🔧 Setup

STEP1: bash

STEP2: conda activate ppsnet

STEP3: pip3 install -r requirements.txt

STEP 4: Install PyTorch using the below command,

pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url

The exact versioning may vary depending on your computing environment and what GPUs you have access to. Note this good article for maintaining multiple system-level versions of CUDA.

STEP 5: Download the C3VD dataset. Our preprocessing steps for the dataset involve performing calibration and undistorting the images (a script for which will be released in the near future). We've provided a validation portion of the dataset in a Google Drive for reference and ease-of-use with this repo's evaluation code. You can download that portion of the dataset here (~29GB). Note the original licensing terms of the C3VD data.

STEP 5: Download the appropriate pre-trained models and place them in a newly created folder called checkpoints/.

💻 Usage

You can evaluate our backbone model using the script:

python3 --data_dir /your/path/to/data/dir --log_dir ./your_path_to_log_dir --ckpt ./your_path_to_checkpoint

Similarly, our teacher model and our student model can be evaluated using the script:

python3 --data_dir /your/path/to/data/dir --log_dir ./your_path_to_log_dir --ckpt ./your_path_to_checkpoint

In addition to generating metrics such as abs_rel and RMSE, both scripts will generate various folders in the specified log_dir containing input images, ground truth and estimate depths, and percent depth error maps. Please keep an eye on this repo for future updates, including a full release of the training code, baselines included in the paper, mesh generation and visualization code, and more.

📜 Acknowledgments

Thanks to the authors of Depth Anything and NFPS for their wonderful repos with open-source code!

📜 Citation

If you find our paper or this toolbox useful for your research, please cite our work.

  title={Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos},
  author={Paruchuri, Akshay and Ehrenstein, Samuel and Wang, Shuxian and Fried, Inbar and Pizer, Stephen M and Niethammer, Marc and Sengupta, Roni},
  journal={arXiv preprint arXiv:2403.17915},