Skip to content

Commit

Permalink
Blob hash migration guide and samples. (#17759)
Browse files Browse the repository at this point in the history
Co-authored-by: jschrepp-MSFT <[email protected]>
  • Loading branch information
jaschrep-msft and jaschrep-msft authored Jan 5, 2021
1 parent 1a95f19 commit bffbffd
Show file tree
Hide file tree
Showing 2 changed files with 214 additions and 0 deletions.
88 changes: 88 additions & 0 deletions sdk/storage/Azure.Storage.Blobs/AzureStorageNetMigrationV12.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Familiarity with the legacy client library is assumed. For those new to the Azur
- [Downloading Blobs from a Container](#downloading-blobs-from-a-container)
- [Listing Blobs in a Container](#listing-blobs-in-a-container)
- [Generate a SAS](#generate-a-sas)
- [Content Hashes](#content-hashes)
- [Additional information](#additional-information)

## Migration benefits
Expand Down Expand Up @@ -470,6 +471,93 @@ BlobSasBuilder sasBuilder = new BlobSasBuilder()
};
```

### Content Hashes

#### Blob Content MD5

V11 calculated blob content MD5 for validation on download by default, assuming there was a stored MD5 in the blob properties. Calculation and storage on upload was opt-in. Note that this value is not generated or validated by the service, and is only retained for the client to validate against.

v11

```csharp
BlobRequestOptions options = new BlobRequestOptions
{
ChecksumOptions = new ChecksumOptions()
{
DisableContentMD5Validation = false, // true to disable download content validation
StoreContentMD5 = false // true to calculate content MD5 on upload and store property
}
};
```

V12 does not have an automated mechanism for blob content validation. It must be done per-request by the user.

v12

```C# Snippet:SampleSnippetsBlobMigration_BlobContentMD5
// upload with blob content hash
await blobClient.UploadAsync(
contentStream,
new BlobUploadOptions()
{
HttpHeaders = new BlobHttpHeaders()
{
ContentHash = precalculatedContentHash
}
});

// download whole blob and validate against stored blob content hash
Response<BlobDownloadInfo> response = await blobClient.DownloadAsync();

Stream downloadStream = response.Value.Content;
byte[] blobContentMD5 = response.Value.Details.BlobContentHash ?? response.Value.ContentHash;
// validate stream against hash in your workflow
```

#### Transactional MD5 and CRC64

Transactional hashes are not stored and have a lifespan of the request they are calculated for. Transactional hashes are verified by the service on upload.

V11 provided transactional hashing on uploads and downloads through opt-in request options. MD5 and Storage's custom CRC64 were supported. The SDK calculated and validated these hashes automatically when enabled. The calculation worked on any upload or download method.

v11

```csharp
BlobRequestOptions options = new BlobRequestOptions
{
ChecksumOptions = new ChecksumOptions()
{
// request fails if both are true
UseTransactionalMD5 = false, // true to use MD5 on all blob content transactions
UseTransactionalCRC64 = false // true to use CRC64 on all blob content transactions
}
};
```

V12 does not currently provide this functionality. Users who manage their own individual upload and download HTTP requests can provide a precalculated MD5 on upload and access the MD5 in the response object. V12 currently offers no API to request a transactional CRC64.

```C# Snippet:SampleSnippetsBlobMigration_TransactionalMD5
// upload a block with transactional hash calculated by user
await blockBlobClient.StageBlockAsync(
blockId,
blockContentStream,
transactionalContentHash: precalculatedBlockHash);

// upload more blocks as needed
// commit block list
await blockBlobClient.CommitBlockListAsync(blockList);

// download any range of blob with transactional MD5 requested (maximum 4 MB for downloads)
Response<BlobDownloadInfo> response = await blockBlobClient.DownloadAsync(
range: new HttpRange(length: 4 * Constants.MB), // a range must be provided; here we use transactional download max size
rangeGetContentHash: true);

Stream downloadStream = response.Value.Content;
byte[] transactionalMD5 = response.Value.ContentHash;
// validate stream against hash in your workflow
```

## Additional information

### Samples
Expand Down
126 changes: 126 additions & 0 deletions sdk/storage/Azure.Storage.Blobs/samples/Sample03_Migrations.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@
using Azure.Identity;
using Azure.Storage;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Specialized;
using Azure.Storage.Blobs.Models;
using Azure.Storage.Sas;
using NUnit.Framework;
using System.Text;
using System.Threading.Tasks;
using System.Security.Cryptography;

namespace Azure.Storage.Blobs.Samples
{
Expand Down Expand Up @@ -608,5 +610,129 @@ public async Task SasBuilderIdentifier()
await container.DeleteIfExistsAsync();
}
}

[Test]
public async Task BlobContentHash()
{
string data = "hello world";
using Stream contentStream = new MemoryStream(Encoding.UTF8.GetBytes(data));

// precalculate hash for sample
byte[] precalculatedContentHash;
using (var md5 = MD5.Create())
{
precalculatedContentHash = md5.ComputeHash(contentStream);
}
contentStream.Position = 0;

// setup blob
string containerName = Randomize("sample-container");
string blobName = Randomize("sample-file");
var containerClient = new BlobContainerClient(ConnectionString, containerName);

try
{
containerClient.Create();
var blobClient = containerClient.GetBlobClient(blobName);

#region Snippet:SampleSnippetsBlobMigration_BlobContentMD5
// upload with blob content hash
await blobClient.UploadAsync(
contentStream,
new BlobUploadOptions()
{
HttpHeaders = new BlobHttpHeaders()
{
ContentHash = precalculatedContentHash
}
});

// download whole blob and validate against stored blob content hash
Response<BlobDownloadInfo> response = await blobClient.DownloadAsync();

Stream downloadStream = response.Value.Content;
byte[] blobContentMD5 = response.Value.Details.BlobContentHash ?? response.Value.ContentHash;
// validate stream against hash in your workflow
#endregion

byte[] downloadedBytes;
using (var memStream = new MemoryStream())
{
await downloadStream.CopyToAsync(memStream);
downloadedBytes = memStream.ToArray();
}

Assert.AreEqual(data, Encoding.UTF8.GetString(downloadedBytes));
Assert.IsTrue(Enumerable.SequenceEqual(precalculatedContentHash, blobContentMD5));
}
finally
{
await containerClient.DeleteIfExistsAsync();
}
}

[Test]
public async Task TransactionalMD5()
{
string data = "hello world";
string blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
List<string> blockList = new List<string> { blockId };
using Stream blockContentStream = new MemoryStream(Encoding.UTF8.GetBytes(data));

// precalculate hash for sample
byte[] precalculatedBlockHash;
using (var md5 = MD5.Create())
{
precalculatedBlockHash = md5.ComputeHash(blockContentStream);
}
blockContentStream.Position = 0;

// setup blob
string containerName = Randomize("sample-container");
string blobName = Randomize("sample-file");
var containerClient = new BlobContainerClient(ConnectionString, containerName);

try
{
containerClient.Create();
var blockBlobClient = containerClient.GetBlockBlobClient(blobName);

#region Snippet:SampleSnippetsBlobMigration_TransactionalMD5
// upload a block with transactional hash calculated by user
await blockBlobClient.StageBlockAsync(
blockId,
blockContentStream,
transactionalContentHash: precalculatedBlockHash);

// upload more blocks as needed

// commit block list
await blockBlobClient.CommitBlockListAsync(blockList);

// download any range of blob with transactional MD5 requested (maximum 4 MB for downloads)
Response<BlobDownloadInfo> response = await blockBlobClient.DownloadAsync(
range: new HttpRange(length: 4 * Constants.MB), // a range must be provided; here we use transactional download max size
rangeGetContentHash: true);

Stream downloadStream = response.Value.Content;
byte[] transactionalMD5 = response.Value.ContentHash;
// validate stream against hash in your workflow
#endregion

byte[] downloadedBytes;
using (var memStream = new MemoryStream())
{
await downloadStream.CopyToAsync(memStream);
downloadedBytes = memStream.ToArray();
}

Assert.AreEqual(data, Encoding.UTF8.GetString(downloadedBytes));
Assert.IsTrue(Enumerable.SequenceEqual(precalculatedBlockHash, transactionalMD5));
}
finally
{
await containerClient.DeleteIfExistsAsync();
}
}
}
}

0 comments on commit bffbffd

Please sign in to comment.