Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 977 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 977 Bytes

Module for processing the text of the article by URL, based on Requests and BeautifulSoup4 libraries. Contain class MiniReadability with next methods:

  • get_text() - return clear text of article
  • write_in_file(text) - write file in path, stored in self.path: "[CUR_DIR]/host/path_item1/path_item2/..." with filename, stored in MiniReadability.filename. Saves to utf-8.

Example for using: "python readable.py https://lenta.ru/news/2019/01/22/brain/"

Tested on articles:

Further development plans:

  • creating a settings file with site processing templates for more accurate selection of a clean text of the article without loss
  • bulk processing URL at a time