You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue has been raised on StackOverflow also. View here.
Trying to scrape the data of official sites of a title page on IMDb using Beautiful Soup. For example, if I need to get data of Intersteller, I have this code:
frombs4importBeautifulSoupimportrequestsurl='https://www.imdb.com/title/tt0816692/'page=requests.get(url)
soup=BeautifulSoup(page.text, 'html.parser')
title_detail_soup=soup.find('div', {'id': 'titleDetails'})
details_soup=title_detail_soup.find_all('div', class_='txt-block')
detail_list= ['Official Sites:', 'Country:', 'Language:',
'Release Date:', 'Also Known As:', 'Filming Locations:']
details= {}
fordetailindetails_soup:
try:
# Each heading (h4) has detail headinghead=detail.find('h4')
ifhead.get_text() indetail_list:
# If the detail heading is in the detail listifhead.get_text() =='Official Sites:':
# If details is about official sitesofficial_site= {}
detail.h4.decompose() # remove <h4> tagsa_tags=detail.find_all('a')
fora_tagina_tags:
# exclude See more>> linksifa_tag.get_text() !='See more':
data=url+a_tag['href'] # final link is base URL + hyperlinkofficial_site[a_tag.get_text()] =datadetails['official-sites'] =official_siteexceptExceptionase:
print(e)
print(details) # Print the detail dictionary
The issue has been raised on StackOverflow also. View here.
Trying to scrape the data of official sites of a title page on IMDb using Beautiful Soup. For example, if I need to get data of Intersteller, I have this code:
HTML of the page:
Output dictionary:
The text was updated successfully, but these errors were encountered: