Python Program to Count Frequency of each Word in a given file by Removing Punctuation Character.
Problem Definition
Develop a Python program to read through the lines of the file, break each line into a list of words, remove all punctuation characters, and then loop through each of the words in the line and count the frequency of each word using a dictionary.
Video Tutorial
Step by Step solution to in Problem
First, Import string library, which contains definitions for punctuation, maketrans, and translates functions.
Read filename form user and store into a variable say, fname. Check whether the file exists or not by opening the file in reading mode. If the file is not present then display the proper message to the user except block of the program.
Create an empty dictionary to store the frequency of each word in the given file.
Use for loop to read the contents of the file line by line and remove any extra trailing and leading whitespaces using strip() function.
Use maketrans() function of string to remove all the punctuation characters in the file. Then divide the line into words and store it into a words list.
Read one word from the words list and check whether the word is present in the dictionary using an operator. If the word is present in the dictionary increase the value of the word or add the word into the dictionary with a value of 1.
Finally print the dictionary, which contains the frequency of each word in the word file, where the key is word and value is
Contents of the sample input file for demonstrating the program say test1.txt
HIT Nidasoshi. HIT,
VTU, BGM
HSIT Nidasoshi@!
@VTU! BELAGAVI
Program Source code to Count Frequency of Word by Removing Punctuation Character
import string fname = input('Enter the file name: ') try: fhand = open(fname) counts = dict() for line in fhand: line = line.strip() line = line.translate(line.maketrans('', '', string.punctuation)) words = line.split() for word in words: if word in counts: counts[word] += 1 else: counts[word] = 1 print(counts) except: print('File cannot be opened:', fname) exit()
Output of Program
Enter the file name: test1.txt
{‘HIT’: 1, ‘NDS’: 2, ‘VTU’: 2, ‘BGM’: 1, ‘HSIT’: 1, ‘BELAGAVI’: 1}
Note: The output dictionary contains key as word and its frequency as value.
Summary:
This tutorial discusses how to develop a Python Program to Count the Frequency of each Word in a given file by Removing Punctuation characters.