Azure Blob Storage is a good solution when it comes to storing unstructured data such as text files, images, or any binary data. In this post, I’ll walk you through how to manage files in Azure Blob Storage using Python. I’ll cover everything from setting up your environment to reading and storing files efficiently.
Thank me by sharing on Twitter 🙏
Understanding Azure Blob Storage
Before diving into the code, it’s essential to understand what Azure Blob Storage is and why it’s a preferred choice for handling files in the cloud.
Azure Blob Storage is a service for storing large amounts of unstructured data. It’s highly scalable, secure, and supports three types of blobs: Block blobs, Append blobs, and Page blobs. For most scenarios, you’ll likely use Block blobs, which are optimized for storing text and binary data.
Setting Up Your Azure Storage Account
To begin, you’ll need an Azure Storage account. This is where your data will reside. If you haven’t set one up yet, you can easily create one in the Azure portal.
- Creating the Storage Account:
- Log in to the Azure portal.
- Navigate to “Storage accounts” and click on “Create.”
- Fill in the required details like subscription, resource group, and storage account name.
- Choose the performance and replication options based on your needs.
- Review and create the account.
- Creating a Container:
- Once your storage account is created, go to the account overview.
- Under the “Data storage” section, click on “Containers.”
- Click on “New container,” name it, and set the appropriate access level.
The container acts like a directory where your blobs (files) will be stored.
NexiGo N60 1080P Webcam with Microphone, Adjustable FOV, Zoom, Software Control & Privacy Cover, USB HD Computer Web Camera, Plug and Play, for Zoom/Skype/Teams, Conferencing and Video Calling
$29.99 (as of January 22, 2025 11:32 GMT +00:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)The Coming Wave: AI, Power, and Our Future
$20.00 (as of January 22, 2025 11:32 GMT +00:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Start with Why: How Great Leaders Inspire Everyone to Take Action
$10.49 (as of January 22, 2025 11:32 GMT +00:00 - More infoProduct prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on [relevant Amazon Site(s), as applicable] at the time of purchase will apply to the purchase of this product.)Retrieving the Azure Storage Connection String
To interact with your Azure Blob Storage using Python, you’ll need the storage account’s connection string. This string is like a key that provides access to your storage account and containers.
- Locate the Connection String:
- In the Azure portal, go to your storage account.
- Under “Security + networking,” click on “Access keys.”
- You’ll see two sets of keys (Key1 and Key2), each with a connection string.
- Click “Show keys” to reveal the connection string and copy it.
Make sure to store this connection string securely, as it provides full access to your storage account.
Using Python to Manage Files in Azure Blob Storage
With your Azure Storage account and container set up, and your connection string in hand, it’s time to start managing files with Python.
Installing the Azure SDK for Python
The first step is to install the necessary Python package to interact with Azure Blob Storage. The azure-storage-blob
package provides a simple and intuitive interface for working with blobs.
You can install it using pip:
pip install azure-storage-blob
Once installed, you’re ready to start writing some Python code to upload and download files.
Uploading Files to Azure Blob Storage
Uploading a file to Azure Blob Storage is straightforward. Below is a Python script that demonstrates how to do this. I’ve chosen TypeScript for code examples in my previous projects, but for interacting with Azure in this context, Python is more direct and commonly used.
Here’s a Python script to upload a file:
from azure.storage.blob import BlobServiceClient
import os
# Retrieve the connection string from the Azure portal
connect_str = "your_connection_string"
container_name = "your_container_name"
# Create a BlobServiceClient object
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Create a container if it doesn't exist
container_client = blob_service_client.get_container_client(container_name)
container_client.create_container()
# Upload a file to the container
file_path = "path/to/your/local/file.txt"
blob_name = os.path.basename(file_path)
# Create a BlobClient to interact with the specific blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Upload the file
with open(file_path, "rb") as data:
blob_client.upload_blob(data)
print(f"File {file_path} uploaded to {blob_name}.")
This script uploads a file from your local system to the specified Azure Blob Storage container. The BlobServiceClient
is used to interact with the blob storage account, and the BlobClient
interacts with individual blobs.
Downloading Files from Azure Blob Storage
Just as uploading is simple, downloading files from Azure Blob Storage is equally effortless. Here’s how you can do it:
download_file_path = "path/to/downloaded/file.txt"
with open(download_file_path, "wb") as download_file:
download_file.write(blob_client.download_blob().readall())
print(f"Blob {blob_name} downloaded to {download_file_path}.")
This script downloads a file from your Azure Blob Storage to your local machine. You can use it to retrieve any file stored in your blob container.
Best Practices for Working with Azure Blob Storage
While the examples above show how to upload and download files, there are several best practices you should consider when working with Azure Blob Storage:
- Use Environment Variables for Connection Strings:
- Avoid hardcoding connection strings in your scripts. Instead, store them in environment variables for better security and flexibility. Example:
import os
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
- Implement Error Handling:
- Always implement error handling when working with external services like Azure. This will help you manage network issues, permission problems, or other unexpected errors gracefully.
- Use Asynchronous Programming:
- For applications that require high performance or handle a large number of files, consider using asynchronous programming to avoid blocking the main thread. Example with
aiohttp
andaioboto3
:
import asyncio
from azure.storage.blob.aio import BlobServiceClient
async def upload_blob_async(container_client, blob_name, data):
blob_client = container_client.get_blob_client(blob_name)
await blob_client.upload_blob(data)
async def main():
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("my-container")
await upload_blob_async(container_client, "myblob", b"data")
asyncio.run(main())
- Optimize for Cost:
- Azure charges based on storage size, transactions, and data egress. Regularly monitor your usage and consider using lifecycle management policies to delete or archive blobs that are no longer needed.
Wrapping Up
Azure Blob Storage is a robust service for managing unstructured data in the cloud. With the Azure SDK for Python, handling files becomes a seamless process, whether you’re working with a handful of files or millions. By following the best practices outlined above, you can ensure that your data is stored securely, efficiently, and cost-effectively.
Whether you’re building a simple file storage system or a complex data pipeline, Azure Blob Storage has the flexibility to meet your needs. And with Python’s extensive libraries and support, integrating it into your workflows is both simple and powerful.