Skip to content

Python based image scraper for Google images. No API key required.

License

Notifications You must be signed in to change notification settings

haukened/google-image-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Image Scraper

A python script for web scraping, to collect images for machine learning model training.

a space python ready to steal your images

What is this?

In experimenting with some image classification machine learning, I realized that for some nice industrial things the images that I was needing to work with were very poorly represented in available public data sets.

Obviously the solution was to contact every single customer and ask them to take hundreds of pictures of things from every angle right?

I created this script to kick-start some of those machine learning tasks, and realized it could really be useful for everyone else too.

Is this legal where I live/work/etc.. ?

(THIS IS NOT LEGAL ADVICE - I AM NOT A LAWYER)

tl;dr - Probably not.

Legalities of things vary widely based on your geographic location, and where your code is deployed. If you're in Europe using a virtual machine in the United States for a project that will be deployed in China, I can't really help you on the legality of that, or what law applies.

In general, however, unless license is explicitly granted for use of images, the copyright of those images belongs to the photographer / publisher. As such, re-distribution and use of those images may be restricted in some manner. Its up to you to comply with the law, and this software is being made available as a learning tool, without warranty.

How does Google feel about this?

To quote the Google Terms of Service:

Google reserves the right to suspend or terminate your access to the services or delete your Google Account if any of these things happen:

  • you materially or repeatedly breach these terms, service-specific additional terms or policies
  • we’re required to do so to comply with a legal requirement or a court order
  • your conduct causes harm or liability to a user, third party, or Google — for example, by hacking, phishing, harassing, spamming, misleading others, or scraping content that doesn’t belong to you

So that being said, not all scraping is a ToS violation. People surf Google Images and download content for their own personal use every day, the semantics involve how much, how fast, and improtantly for what purpose. Take care to keep to a resonable number of images, that you intend to use for the proper purposes.

Quick Start

  1. Clone this repository somewhere on your machine

    git clone https://github.com/haukened/google-image-scraper

  2. Move into the cloned directory

    cd google-image-scraper

  3. Create a python virtual environment (Optional, however keeps dependencies clean)

    python3 -m venv .venv

    source ./.venv/bin/activate

  4. Install dependencies

    pip install -r requirements.txt

  5. Run the scraper

    ./scrape.py a bear riding a tricycle

  6. Explore the options

    ./scrape.py -h

    usage: scrape.py [-h] [-s] [-v] [-n NUMBER] [-o OUTPUT_DIRECTORY] [query ...]
    
    Downloads images from Google image search
    
    positional arguments:
    query                 the search query for images. aka "What you would type into google"
    
    options:
    -h, --help            show this help message and exit
    -s, --show-browser    show the browser during downloading
    -v, --verbose         show debug information
    -n NUMBER, --number NUMBER
                            number of images to download
    -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                            the path to store downloaded images
    
    Created by @haukened | Buy me a beer: https://beer.hauken.us
    

Examples

Show the browser during scraping, add the -s flag

  • ./scrape.py -s -n 500 jimmy kimmel laughing

Store the images somewhere else

  • ./scrape.py -n 1000 -o /home/<you>/datasets/bears/ the greatest threat to america

About

Python based image scraper for Google images. No API key required.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages