w3resource

Python Program: Calculate Jaccard Similarity Coefficient


9. Jaccard Similarity Using Counter

Write a python program to find the Jaccard similarity coefficient between two lists using 'Counter' objects.

Jaccard Similarity, also known as the Jaccard Index or Jaccard Coefficient, is a measure used to quantify similarity between two sets. It's commonly employed in various fields, including data mining, information retrieval, and natural language processing, to compare similarities between sets of elements.

For example, consider two sets:

A = {apple, banana, orange, kiwi}

B = {banana, kiwi, pineapple}

The intersection of A and B is {banana, kiwi}, which has a cardinality of 2. The union of A and B is {apple, banana, orange, kiwi, pineapple}, which has a cardinality of 5. So, the Jaccard Similarity between sets A and B is 2/5, which is 0.4.

Sample Solution:

Code:

from collections import Counter

def jaccard_similarity(list1, list2):
    counter1 = Counter(list1)
    counter2 = Counter(list2)
    
    intersection_count = sum((counter1 & counter2).values())
    union_count = sum((counter1 | counter2).values())
    
    jaccard_coefficient = intersection_count / union_count
    return jaccard_coefficient

def main():
    list1 = ['Red', 'Green', 'Blue', 'Orange']
    list2 = ['Green', 'Pink', 'Blue']
    
    jaccard_coefficient = jaccard_similarity(list1, list2)
    print("List 1:", list1)
    print("List 2:", list2)
    print("Jaccard Similarity Coefficient:", jaccard_coefficient)

if __name__ == "__main__":
    main()

Output:

List 1: ['Red', 'Green', 'Blue', 'Orange']
List 2: ['Green', 'Pink', 'Blue']
Jaccard Similarity Coefficient: 0.4

In the exercise above, the "jaccard_similarity()" function takes two lists and computes the Jaccard similarity coefficient using "Counter" objects. It first creates counters for each list and calculates the intersection and union count of their elements. The result is printed along with the original lists.

Flowchart:

Flowchart: Python Program: Calculate Jaccard Similarity Coefficient.

For more Practice: Solve these Related Problems:

  • Write a Python program to compute the Jaccard similarity coefficient between two lists by converting them to Counters and calculating the ratio of the intersection to the union.
  • Write a Python function that takes two lists, converts them to Counters, and returns the Jaccard similarity as a percentage.
  • Write a Python script to calculate the Jaccard similarity between two text documents by using word Counters and then printing the coefficient.
  • Write a Python program to compare the similarity of two lists using the Jaccard index, and then display the result as a float between 0 and 1.

Python Code Editor :

Previous: Python counter filter program: Counting and filtering words.
Next: Python Program: Updating item counts using Counter objects.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.