Python program to retrieve all links from a given Webpage
Write a python program to retrieve all links from a given Webpage and save them as a text file.
Video Tutorial
Installing Necessary Libraries to get all links from a given Webpage
Following libraries are required to Python program to retrieve all links from a given Webpage.
- BeautifulSoup4
- requests
Use the following commands to install the above libraries:
pip install beautifulsoup4==4.9.2
pip install requests==2.24.0
Source Code to retrieve all links from a given Webpage using BeautifulSoup
import requests as rq from bs4 import BeautifulSoup url = input("Enter website Link: ") # Check whether link contatins https or http call rq.get(url) # else append url to https:// before call rq.get() if ("https" or "http") in url: data = rq.get(url) else: data = rq.get("https://" + url) #Extract the html data using html.parser of BeautifulSoup s = BeautifulSoup(data.text, "html.parser") links = [] for link in s.find_all("a"): links.append(link.get("href")) print ("All links of the given website:") for link in links: print (link[:11]) # Writing the output links to a file (myLinks.txt) with open("myLinks1.txt", 'w') as saved: print(links[:11], file=saved)
Explanation:
First, import the necessary libraries or modules. Next, ask the user to enter the website link.
Once user enters the website link, Check whether link contatins https or http call rq.get(url) otherwise append url to https:// before call rq.get().
Extract the html data using html.parser of BeautifulSoup function.
Find all the tags with “a” and get the content of “href” and store into links list. Finally display the links on screen and store into a text file.
Output:
Enter website Link: https://vtupulse.com/
All links of the given website:
“https://vtupulse.com/”,
“https://vtupulse.com/category/cplusplus-programs/”,
“https://vtupulse.com/category/computer-graphics/”,
“https://vtupulse.com/python-programs/python-application-programming-tutorial/”,
“https://vtupulse.com/julia-tutorial/introduction-to-julia-julia-tutorial/”,
“https://vtupulse.com/cbcs-cse-notes/big-data-analytics-17cs82-vtu-cbcs-notes/”,
“https://vtupulse.com/cbcs-cse-notes/15cs73-machine-learning-vtu-notes/”,
“https://vtupulse.com/category/perl/”,
Summary:
This tutorial discusses how to write Python program to get all links from a given Webpage and save them as a txt file. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.