Creating a Git-Like Diff Viewer in Python Using Difflib

When I first stumbled upon the need to show differences between strings in a Git-like format, I wasn’t sure where to start. The idea seemed daunting—how could I highlight additions, removals, and changes in a clear, color-coded way? Luckily, Python’s difflib module came to the rescue. What started as a challenge turned into a rewarding exploration of how to create a diff viewer in Python.

Thank me by sharing on Twitter 🙏

If you’ve ever wanted to compare strings or text files in a clean, visually intuitive way, this is the perfect opportunity to get hands-on with Python’s built-in tools. Here’s how I went about it, step by step.

Why Use Python’s Difflib?

Comparing text is a common task, whether you’re debugging, writing tests, or reviewing changes to data. Tools like Git have made us accustomed to seeing differences clearly, with removals, additions, and even character-level hints. Python’s difflib module is a powerful tool that enables this functionality without the need for external libraries. It’s fast, versatile, and integrates seamlessly into scripts or CLI applications.

Setting Up the Foundation

The first step in building a Git-like diff viewer was to identify the core requirements. I wanted to:

  1. Compare two strings or pieces of text.
  2. Highlight additions in green and removals in red, mimicking Git’s style.
  3. Optionally show hints where character-level changes occur.

Python’s difflib module offers multiple ways to compare text. For this project, I focused on two primary methods: difflib.Differ for a granular comparison and difflib.unified_diff for a streamlined summary.

Building a Diff Viewer with Difflib.Differ

To start, I created a function that uses difflib.Differ. This class breaks down the comparison into detailed line-by-line and character-by-character differences. Here’s the code:

Python
import difflib

def show_diff(string1, string2):
    lines1 = string1.splitlines()
    lines2 = string2.splitlines()

    differ = difflib.Differ()
    diff = differ.compare(lines1, lines2)

    for line in diff:
        if line.startswith("- "):
            print(f"\033[31m{line}\033[0m")  # Red for removals
        elif line.startswith("+ "):
            print(f"\033[32m{line}\033[0m")  # Green for additions
        elif line.startswith("? "):
            print(f"\033[33m{line}\033[0m")  # Yellow for hints
        else:
            print(line)

In this function, splitlines breaks the strings into individual lines, which makes it easier to compare them meaningfully. The differ.compare method generates a list of lines, each prefixed with a symbol indicating its status:

  • -: Indicates a line removed from the first string.
  • +: Indicates a line added to the second string.
  • ?: Points to character-level hints, such as differences in words.

Using ANSI escape codes, I added color to the output. This made the diff viewer both functional and visually appealing.

Enhancing the Output with Unified Diff

While difflib.Differ gives a detailed breakdown, sometimes a summarized format works better, especially for longer texts. This is where difflib.unified_diff shines. Unlike Differ, it focuses on showing only the lines that differ, along with a few lines of context.

Here’s how I implemented it:

Python
import difflib

def show_unified_diff(string1, string2):
    lines1 = string1.splitlines()
    lines2 = string2.splitlines()

    diff = difflib.unified_diff(lines1, lines2, lineterm="")
    for line in diff:
        if line.startswith("-"):
            print(f"\033[31m{line}\033[0m")  # Red for removals
        elif line.startswith("+"):
            print(f"\033[32m{line}\033[0m")  # Green for additions
        else:
            print(line)

With unified_diff, the output is more concise. It includes only the changes, prefixed by a - or +, and provides surrounding lines for context. This mirrors what you’d see in a Git diff.

Comparing the Two Approaches

Both Differ and unified_diff are useful, but they serve slightly different purposes. If you’re looking for a detailed comparison that highlights even minor changes within a line, Differ is your best bet. On the other hand, if you want a clean summary of the differences, unified_diff is the way to go.

For instance, when comparing the strings:

string1 = "The dog in the hat"
string2 = "The cat in the hat"

Differ would show:

Meanwhile, unified_diff would show:

Adding Polish and Practical Use Cases

Once the basic functionality was in place, I wanted to make the tool user-friendly for various scenarios:

  1. CLI Integration: Wrapping the functions in a script allows users to pass files or strings as arguments.
  2. Error Reporting: Comparing expected vs. actual outputs in tests can be enhanced with color-coded diffs.
  3. Code Reviews: Quickly highlight changes between versions of a file.

For example, to integrate this as a CLI tool:

Python
import argparse

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Compare two strings or files and display a Git-like diff.")
    parser.add_argument("string1", help="First string or file to compare")
    parser.add_argument("string2", help="Second string or file to compare")
    args = parser.parse_args()

    show_unified_diff(args.string1, args.string2)

This setup allows users to pass strings directly or read from files, making the diff viewer even more versatile.

Lessons Learned Along the Way

As I worked through this project, a few key insights stood out:

  • Python’s built-in libraries are often more powerful than they appear. Difflib, in particular, is a treasure trove for text comparison.
  • Visualizing data with color can make a significant difference in usability. Even simple ANSI codes transform plain text into something much more engaging.
  • It’s easy to customize tools to fit specific needs. Whether you’re debugging code, comparing files, or building a CLI tool, the flexibility of Python makes it possible.

Wrapping It Up

Creating a Git-like diff viewer in Python was both an educational and practical exercise. With just a few lines of code, I was able to replicate the functionality of sophisticated tools like Git diff, tailor it to specific use cases, and even add a personal touch with color-coding.

Whether you’re a developer debugging text differences, a tester comparing outputs, or just someone curious about Python, exploring difflib is well worth your time. The possibilities are endless, and the tools you create can save countless hours down the line.

Share this:

Leave a Reply