top of page

I Use Python Beautiful Soup for Web Scraping?

tramngocnguyenhcm

Tools: Python (Beautiful Soup, Pandas), Excel, Tableau


1. What is Web Scraping?

Web scraping is the process of extracting data from websites. It involves using software or scripts to access a website's HTML code, parse it, and extract specific information from it. This information can include text, images, links, and more, depending on the needs of the scraper. Web scraping is commonly used for various purposes, such as data collection, data analysis, research, and automation.

(Source: Chat GPT)


2. What is Beautiful Soup?

In this project, I use Beautiful Soup, a Python liabrary for web scraping purpose to pull the data out of HTML and XML files. Then we can extract and clean the data to the data format that we want.


Example of code in Beautiful Soup liabrary, source: Chat GPT


3. My Web Scraping Project

In this project, I chose Wikipedia as the website to extract information data of the 100 largest companies in the US.



I used Beautiful Soup for scraping the data, then Pandas to clean it and extract the final data into an Excel CSV file. Finally, I connected the CSV file to Tableau to visualize the data and gain insights from it.


See my code on Github


Here is the final result on Python (Jupyter Notebook).



After exporting this file from Jupyter Notebook to .csv Excel file, I obtained a clean and beautiful database.



You can freely download my file here: Top 100 largest companies in the US



To visualize the data, I use Tableau to create a map that displays the density of large corporations in the US. Additionally, I can view detailed information about each company by hovering over its corresponding dot.




My dashboard is on Tableau Publish Server now.


In conclusion, in this project, I demonstrated how I created an end-to-end data analytics process. First, I scraped raw data from a website using Python Beautiful Soup and cleaned it with Pandas. Next, I downloaded and organized the data in a worksheet using Excel. Finally, I used Tableau to connect the database and visualize the data, then I created beautiful and interactive informative visualizations.


To interact with my dashboard, view my visualizations on Tableau Publish.





 

Additional information:

I have learned this effective practice of web scraping using Beautiful Soup (Python) from Alex The Analyst, one of the current big data gurus.


Recent Posts

See All

Comments


©2021 by Tram Nguyen

bottom of page