Data and Image Scraping the IMDB website using Python BeautifulSoup

Data scraping, also known as web scraping, is the process of extracting information from a website and converting this information into a Database. It’s one of the most efficient ways to get data from the web. Web Scraping can also be used to scrape Images in the given Website.

BeautifulSoup

Beautiful Soup is a Python library for parsing HTML and XML documents. It is a library that makes it easy to scrape information from web pages. We will be using this library to scrape Data and Images from the IMDB website.

Required Modules

import pandas as pd

import re

import lxml

from PIL import Image

from io import BytesIO

import os

import webbrowser

from bs4 import BeautifulSoup

from requests import get

Data and Image Scraping the "TOP 200 Best Indian Celebrities Of India" website of IMDb

Import all necessary libraries.
To go to the IMDB website, click on this link: https://www.imdb.com/list/ls068010962/

Now, Right click anywhere on the screen and select 'Inspect' .

Find the Elements that correspond to the data we want to extract.

Make a note of the TAGS as well as the Attributes like class, id, etc. We'll use that later.
Then follow the Indian_Movie_Celebrities_Database_Generator.ipynb file available in this repository. A detailed explanation of the code is provided in the .ipynb file.
After running the .ipynb file, the generated DataFrame containing information of 200 celebrities is displayed as a table in a browser as shown below:

From celebrity 1

Right until the 200th Celebrity!!!

Uploaded Files Information

Indian_Movie_Celebrities_Database_Generator.ipynb - The Python file containing the code for Data and Image Scraping.
Images/ (Folder) - This folder consists of all the images Scraped from the IMDB website.
Top 200 Best Indian Actors and Actresses.html - The .html file created after running the .ipynb file.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Images		Images
Indian_Movie_Celebrities_Database_Generator.ipynb		Indian_Movie_Celebrities_Database_Generator.ipynb
LICENSE		LICENSE
README.md		README.md
Top 200 Best Indian Actors and Actresses.html		Top 200 Best Indian Actors and Actresses.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data and Image Scraping the IMDB website using Python BeautifulSoup

BeautifulSoup

Required Modules

Data and Image Scraping the "TOP 200 Best Indian Celebrities Of India" website of IMDb

Uploaded Files Information

About

Releases

Packages

Languages

License

ILasya/Web_Scraping_Indian-Movie-Celebrities_Database_ILasya

Folders and files

Latest commit

History

Repository files navigation

Data and Image Scraping the IMDB website using Python BeautifulSoup

BeautifulSoup

Required Modules

Data and Image Scraping the "TOP 200 Best Indian Celebrities Of India" website of IMDb

Uploaded Files Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages