Storage 107 (With table (NoSQL and more))

STORAGE ACCOUNTS

  • Cloud storage (data objects, file system, messaging, NoSQL)
  • Replication across data center or regions
  • Encrypted and access control
  • Scalable to meet demands
  • Managed, MS handles HW
  • Accessible, HTTP, HTTPS and SDK’S

Service for store files, messages, tables or other data, fileshare.

Main 3 categorys

  • Storage for VM, disks (Persistent block storage IaaS vm), Files (managed share)
  • Unstructured data, Blobs (Highly scalable, REST based object store), data lake store (Hadoop HDFS aas)
  • Structured data, Tables (Key/value, NoSql), Cosmos db (global, distributed db service), Azure SQL db (daas, SQL)

General purpose storage, Standard (HDD), premium (SSD)

Storage service is accessed via storage account:

  • Containers (scalable obj store for text/binary data)
  • Files (shares)
  • Queues (Msg store between apps)
  • Tables (NoSql)

Storage account kinds

Microsoft Azure supports 4 different replication services, locally-redundant storage, zone-redundant storage, geo-redundant storage and read-access geo-redundant storage (also known as LRS, ZRS, GRS and RA-GRS respectively.)

  • Blob storage, blob
  • General purpose V1, supported services: blob, file, queue, table, disk (Use general-purpose v2 accounts instead when possible), supported performance tiers: Standard, Premium
  • General purpose V2, supported services: blob, file, queue, table, disk (Recommended for most scenarios using Azure Storage), supported performance tiers: Standard, Premium
  • Block blob storage, blob
  • File storage, files only, supported performance tiers is Premium

General purpose V2, replication options: LRS, GRS, RA-GRS, ZRS, File storage replication options is LRS and ZRS (limited reg)

Replication strategies

Data in storage is always replicated, (durability and availability). replication copies data for protection against plan/unplanned actions like HW fail, network, power faults, natural disaster and so on.

LRS (Locally redundant storage), lowest cost replication, least durability. Disaster in data center, all replicas can be lost/unrecoverable.

ZRS (Zone redundant storage), replicates across 3 cluster storage’s in a region, physically away from each other and is in its own variability zone. If one zone goes down, data can be access in another. Good performance and low latency. Btw: not in all regions, changing ZRS then must move “data” physically, and may not protect data for a regional fault, multiple zone fault.

GRS (Geo redundant storage), replicate data to a second region (far away from primary region). GRS cost > LRS, protect against region fault. At second region, it is replicated with LRS. SLA, 99.9 (16 9’s) %

Two options

  • GRS, two data center, two region, but MS must put in action the switch for reading
  • RA GRS (Read access) , GRS but now you have option to read from second

GZRS (Geo), ZRS (Zone redundant storage) and GRS (Geo redundant storage) combined. For apps with consistency, durability, performance, availability and be able to read in case of a disaster. SLA, 99.9 (16 9’s) %

Accessing storage

Unique URL for blob, table, queue and file.

http://nameofaccount.blob.core.windows.net
-- Where blob can be substituted with table, queue or file

Custom domain can be mapped with CNAME (

(CNAME is a Canonical Name Record or Alias Record, The DNS resolves the system’s domain name to its IP address, but sometimes more than one domain name resolves to the same IP address, and this is where the CNAME is useful.

A machine can have an unlimited number of CNAME aliases, but a separate CNAME record must be in the database for each alias. The major advantage of using CNAME is that if we change the IP address of one A record then any CNAME record pointing to that host will also changed.)

Secure storage endpoints with firewall and virtual networks function to allow connection from specific vnets or subnets, the net’s must be in same region as storage account.

Blob storage

Unstructured data stored as objects, text/binary, msi file, video, document. Common use of this storage is serving images, streaming video/audio, bck/recover, archiving, distributed access, ie install files.

Here we made a blob container in the rg

Since we have made the access private, we get a “Resource not found” when visiting the Url with the image:

https://boosestorage.blob.core.windows.net/boosecontainer/3nf.jpg

Lets make Blob (anonymous read access for blobs only)

Public access level, default it is private to account owner.

  • Private = no anonymous access to container or blob
  • Blob = allow anonymous public read for blob only
  • Container = allow anonymous public read and access to container and blob

Blob access tiers, hot (frequent access, most cost, new storage accounts are default hot (view picture below)), cool (storing large amounts, infrequent access, minimum 30 days, cost effective, but access is expensive), archive(several hours of retrieval latency, minimum 180 days, cheaper the both, but cost much when access)

Lifecycle management

Uploading, Block blobs for storing data, like text or binary (file, image, video), Append like block but good for append (log), Page (max 8 TB), good for read, write vm’s uses this as data and OS disk.

Upload tools, AzCopy (cmd), Data factory (with account key, shared access signature, service principal, managed id), Data box disk (transfer on-prem data if large dataset or network limitations, get disk cope, send and ship), Import/Export service almost like Data box disk, Storage explorer and a .Net library also.

Storage Pricing is important and must be checked before making a storage.

Table (NoSQL)

Here we added a table, but how do we interact with it and the other storage’s?

Use some SDK’s or storage explorer

https://docs.microsoft.com/en-us/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows#sign-in-to-azure

Download the StorageExplorer.exe and install it, then connect to you account and signin. You will the be presented with a view like this:

Now lets view the table we just create, press the folder icon

And here all the data we have access to is present, for more information on tables, view

https://docs.microsoft.com/en-us/rest/api/storageservices/designing-a-scalable-partitioning-strategy-for-azure-table-storage

  • The PartitionKey property stores string values that identify the partition that an entity belongs to.
  • The RowKey property stores string values that uniquely identify entities within each partition.
  • The PartitionKey and the RowKey together form the primary key for the entity.
  • The Timestamp property provides traceability for an entity.
  • Partitions represent a collection of entities with the same PartitionKey values (read more about this from the above link)

Lets add some items and add an extra column for one entity.

Here we added 3 entity’s and for the TAG-2 we added a column Value (others entity’s does not get this column )

For running a query, press query tab, here we did a query with = operator for PLANT-2 and got back just the one we have.

Security

Security to build secure applications, shared access signature, encryption and good practices.

Overview

  • Encryption
  • Authentication, AD and RBAC
  • HTTPS, SMB 3
  • Disk encryption
  • Shared access signature, granting access

Request for secure resources

Auth

  • AD, IAM (Identity and Access Management) and role-based access RBAC
  • Shared key, access keys and more parameters
  • Shared access signature (SAS), access to a resource within a specific time
  • Anonymous, blob/container as shown above with the picture on HTTPS

URI and SAS

When creating SAS you get a URI with parameters/tokens.

Paramters like URI (endpoint), version, service, resource, ip range (accept from this range) and more. SAS applies to blob and file.

So if we have a storage account, a blob storage and a image, and make a SAS at service level :

After pressing the Generate SAS token and URL we can use a browser to view the file

sp=r&st=2020-10-08T11:10:26Z&se=2020-10-08T19:10:26Z&spr=https&sv=2019-12-12&sr=b&sig=1P1dzZB8STJRM3Q0lwjgBeb%2BvAu6g2zpzwLaCS9BdKY%3D

https://boosestorage.blob.core.windows.net/imageblob/cloud.jpg?sp=r&st=2020-10-08T11:10:26Z&se=2020-10-08T19:10:26Z&spr=https&sv=2019-12-12&sr=b&sig=1P1dzZB8STJRM3Q0lwjgBeb%2BvAu6g2zpzwLaCS9BdKY%3D

“A shared access signature is a signed URI that points to one or more storage resources and includes a token that contains a special set of query parameters. The token indicates how the resources may be accessed by the client. One of the query parameters, the signature, is constructed from the SAS parameters and signed with the key that was used to create the SAS. This signature is used by Azure Storage to authorize access to the storage resource.”

ref https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview

SAS account level, must select allowed resource types, one must be selected.

Encryption is made for all new/existing accounts by default and no disabled.

Managed keys

Access keys for auth to application on request, can use key vault for storing keys. The storage account and key vault must be in same region.

Security best practice

If SAS stolen, anyone can use it.

  • Use HTTPS
  • Stored access policy, set expiration far from date and check it regular. If it goes out, then must generate new SAS keys.
  • AD hoc or test SAS, use short expiration date, if it is stolen, it is short time only anyway
  • Client renew SAS
  • SAS start time, depending on timezone, could not work the first few hours
  • Provide minimum access, access only to what the client needs
  • Account must pay for all usages also SAS use, upload 100 gb to blob and read it 5 times, is 500 GB egress (utgang) cost for owner
  • Validate, check data from SAS writes, data can have issues, or data must be correct before app can use it.
  • Is SAS best way to go?, Risk, not write direct to storage but to a lower tire storage and then check it before move (audit, policy). If all should be public (blob / file) , make container / storage public instead of SAS to all connections.
  • Storage analytics for monitor.

Files and File sync

File stores uses SMB

“The Server Message Block (SMB) Protocol is a network file sharing protocol, and as implemented in Microsoft Windows is known as Microsoft SMB Protocol. The set of message packets that defines a particular version of the protocol is called a dialect. The Common Internet File System (CIFS) Protocol is a dialect of SMB. Both SMB and CIFS are also available on VMS, several versions of Unix, and other operating systems.”

ref https://docs.microsoft.com/en-us/windows/win32/fileio/microsoft-smb-protocol-and-cifs-protocol-overview?redirectedfrom=MSDN

  • Azure Files, SMB, client libs and REST, use when moving to cloud, for system already using native file sys. Files are directory object. (mount win, linux mac)
  • Azure blobs, client libs and REST, use when support streaming and access at random. Blobs flat namespace

Port 445 must be open, TCP 445, SMB.

Snapshots stored the state in time for the files, read copy of data. If you want to remove share, then must remove all snapshots.

File sync, well I need a coffee now, read about it when you need it.

Managing storage

  • Storage explorer shown above
  • Import/Export, shipping disk to MS
  • AzCopy (or use robocopy)