Open Data Blend February 2022 Update
18th March 2022
By Open Data Blend Team
The Open Data Blend February 2022 update includes changes that extend the Open Data Blend for Python library to support cloud data lake storage services.
Open Data Blend Datasets
English Prescribing Data for January 2022 Is Available
We have updated the Prescribing dataset with the latest available NHS English Prescribing Data which includes activity up until January 2022. You can download the data from the Open Data Blend Datasets Prescribing page, analyse it directly in supported BI tools through the Open Data Blend Analytics service, or instantly explore insights through the Open Data Blend Insights service.
Cloud Data Lake Integrations with Open Data Blend for Python
We have extended Open Data Blend for Python to support Azure Blob Storage, Azure Data Lake Storage (ADLS) Gen2, and Amazon S3 as target file systems.
With a few simple lines of code, you can quickly ingest our datasets into your data lake. Once ingested, you can interactively query and analyse the ORC and Parquet files using data lake analytics services like Amazon Athena, Azure Synapse Analytics, and Databricks.
Below you can see some examples of how to use the new cloud data lake integrations.
Ingesting Data Directly into Azure Blob Storage
import opendatablend as odb
dataset_path = 'https://packages.opendatablend.io/v1/open-data-blend-road-safety/datapackage.json'
access_key = '<ACCESS_KEY>' # The access key can be set to an empty string if you are making a public API request
# Specify the resource name of the data file. In this example, the 'date' data file will be requested in .parquet format.
resource_name = 'date-parquet'
# Get the data and store the output object using the Azure Blob Storage file system
configuration = {
"connection_string" : "DefaultEndpointsProtocol=https;AccountName=<AZURE_BLOB_STORAGE_ACCOUNT_NAME>;AccountKey=<AZURE_BLOB_STORAGE_ACCOUNT_KEY>;EndpointSuffix=core.windows.net",
"container_name" : "<AZURE_BLOB_STORAGE_CONTAINER_NAME>" # e.g. odbp-integration
}
output = odb.get_data(dataset_path, resource_name, access_key=access_key, file_system="azure_blob_storage", configuration=configuration)
# Print the file locations
print(output.data_file_name)
print(output.metadata_file_name)
Ingesting Data Directly into Azure Data Lake Storage (ADLS) Gen2
import opendatablend as odb
dataset_path = 'https://packages.opendatablend.io/v1/open-data-blend-road-safety/datapackage.json'
access_key = '<ACCESS_KEY>' # The access key can be set to an empty string if you are making a public API request
# Specify the resource name of the data file. In this example, the 'date' data file will be requested in .parquet format.
resource_name = 'date-parquet'
# Get the data and store the output object using the Azure Data Lake Storage Gen2 file system
configuration = {
"connection_string" : "DefaultEndpointsProtocol=https;AccountName=<ADLS_GEN2_ACCOUNT_NAME>;AccountKey=<ADLS_GEN2_ACCOUNT_KEY>;EndpointSuffix=core.windows.net",
"container_name" : "<ADLS_GEN2_CONTAINER_NAME>" # e.g. odbp-integration
}
output = odb.get_data(dataset_path, resource_name, access_key=access_key, file_system="azure_blob_storage", configuration=configuration)
# Print the file locations
print(output.data_file_name)
print(output.metadata_file_name)
Ingesting Data Directly into Amazon S3
import opendatablend as odb
dataset_path = 'https://packages.opendatablend.io/v1/open-data-blend-road-safety/datapackage.json'
access_key = '<ACCESS_KEY>' # The access key can be set to an empty string if you are making a public API request
# Specify the resource name of the data file. In this example, the 'date' data file will be requested in .parquet format.
resource_name = 'date-parquet'
# Get the data and store the output object using the Amazon S3 file system
configuration = {
"aws_access_key_id" : "<AWS_ACCESS_KEY_ID>",
"aws_secret_access_key" : "AWS_SECRET_ACCESS_KEY",
"bucket_name" : "<BUCKET_NAME>", # e.g. odbp-integration
"bucket_region" : "<BUCKET_REGION>" # e.g. eu-west-2
}
output = odb.get_data(dataset_path, resource_name, access_key=access_key, file_system="amazon_s3", configuration=configuration)
# Print the file locations
print(output.data_file_name)
print(output.metadata_file_name)
Want to learn more about how Open Data Blend for Python can help you to integrate our datasets? Head over to the GitHub or PyPI page.
Follow Us and Stay Up to Date
Follow us on X and LinkedIn to keep up to date with Open Data Blend, open data, and open-source data analytics technology news. Be among the first to know when there's something new.
Blog hero image by Natasha Miller on Unsplash.