Amazon Simple Storage Service, also known as Amazon S3, is an online storage facility. It is cheap, fast, and easy to set up. And since it’s a service provided by e-commerce giant Amazon, one can be assured of the security of the stored data.
Amazon S3 is one of the core services and among the pillars of the AWS cloud infrastructure. The S3 stands for “Simple Storage Service” and is among the three foundational services that Amazon started with back in 2006. Almost all services in the AWS cloud infrastructure use this service in one way or another. In simple terms, Amazon S3 is an object store just like a regular file system on the computer but is “infinitely scaling,” as AWS advertises it. There’s no limit to the amount of data that can be stored on s3.
The key features of the s3 service are – infinite, highly scalable, high availability, high durability, manageability, and security. Even with the option of infinite storage, in terms of cost, it is one of the most cost-effective services on a pay-as-you-use model. It can be used to store all kinds of images, text, and multi-media data.
The article covers the following sections: Amazon S3 overview.
- S3 Buckets and Objects overview.
- S3 Advance features.
- S3 Popular use cases.
The introduction to AWS Services is a mini-series containing articles that provide a basic introduction to different AWS services. Each article covers a detailed guide on how to use the AWS Service. This series aims at providing “A Getting Started Guide on Different AWS Services.”
Amazon S3 overview
Amazon S3 allows users to build highly scalable applications with infinite but inexpensive storage. S3 being a fast, affordable, and reliable option enables it to be a backbone for many cloud-based applications. It is also an integral service in the AWS cloud infrastructure, with integration options available for most services. Users can store or retrieve any amount of data per their needs at any time from around the globe.
In S3, everything is stored like an object which is mainly used for storing text-based data like server logs or multi-media data like images, videos, or mp3 files. The data is stored with 99.99% availability which sums to just 53 mins of downtime in a year, and 99.999999999% durability, which means that a single object is lost every 10,000 years.
The service can be easily accessed via the management console, providing users with a well-built and easy-to-use interface to manage all the different features and options easily. In addition to the console, the AWS CLI and SDKs for S3 are also available for more programmatic access.
Some more advanced features that S3 offers are object encryption for data security, versioning and replication for backup and durability, storage classes for cost optimization based on object access patterns, auditing, monitoring, and compliance features, permissions, and access management at the user, bucket, and even object level for restricted access, query in-place support for other AWS services to allow direct query on the objects.
S3 Buckets and Objects overview
With a general overview of the S3 service, deep diving into how the objects are structured in the S3 service. S3 follows the convention of buckets and objects.
A bucket in S3 is just like a directory that contains an unlimited number of objects(files). Each account can have 100 buckets(soft limit and can be increased by request to AWS) by default, but there’s no cap on the number of objects in one bucket, but each object can only be 5TB in size.
S3 is a regional service, and buckets are saved at the regional level. Each bucket has a globally unique name (bucket names follow a convention), meaning that only one bucket can have a given name at a time in the entire AWS infrastructure unless deleted by the bucket owner. Buckets are created in a specific region. So selecting geographically closer regions can help minimize latency.
An object in S3 is just a file that is stored. Objects consist of data and metadata. Each object has a unique key that consists of a prefix and the object name. E.g., s3://s3bucket/some_folder/s3_object.txt
The bucket name, key, version ID, and object name uniquely identify the object. Although mentioned earlier that buckets are like directories and even the UI on the management console will show a directory structure, there’s no concept of directories within buckets, nor do any directories exists like in a Linux file system.
The directories in buckets are only there for data organization purposes. S3 works entirely with keys. S3 objects can be 5TB in size but unlimited in number. Objects also have additional attributes like metadata, version, and tags.
Objects Consistency Model:
The S3 service, due to its high availability and durability, provides atomic updates to objects meaning the objects are never partially updated. The consistency model for S3 is based on two approaches:
Read after write consistency for new objects: If a new object is requested right after creation, S3 might return 404 but eventually, the objects become available.
Eventual Consistency for objects overwrites: If an object is retrieved right after the update, S3 might return the old copy or the new one but never a partially updated object.
Buckets, objects, and keys together make up the entire working structure of the S3 service. The advanced features for buckets and objects are discussed below:
S3 Advance features
S3, although being a very foundational service, offers many advanced features that are listed as follows:
S3 Storage Classes and Life Cycle Management
Not all objects in a bucket have the same access patterns; some are accessed more frequently than others. Considering this, the S3 service allows objects to be transitioned into different storage classes depending on their access patterns to help reduce the storage cost. This can be either done manually via the management console, or the SDK’s life cycle configuration on buckets can be set to transition objects in a different storage class.
S3 offers various storage classes that can be used depending on the S3 popular use cases object’s access patterns.
S3 Standard: This is the default storage class for all objects. This storage class is suitable for frequently accessed objects (images)and performance-sensitive use cases.
S3 Reduced Redundancy Storage(RRS): This storage class is also for the frequently accessed objects but is more suitable for non-critical data that are easily reproducible if lost. AWS recommends not using RRS.
S3 Intelligent-Tiering: This storage class is suitable for use cases to optimize storage costs automatically. Depending upon the changes in object access patterns, the S3 Intelligent-Tiering moves data automatically to a frequent access tier or low-cost infrequent access tier. A small monthly automation and monitoring fee is associated with each object for this storage class. No additional tiering fees or object retrieval fees are related to this storage class. Ideal for use cases having long-lived data with unknown and unpredictable access patterns.
S3 Standard-IA and S3 One Zone-IA: These storage classes are suitable for long-lived, infrequently accessed objects such as backup files. The objects are available at millisecond access times, but an object retrieval fee is associated with these storage classes.
The difference between S3 Standard-IA and S3 One Zone-IA is that the former stores data across multiple availability zones redundantly while the latter stores data in only one AZ.
Hence One Zone-IA is less expensive but less resilient and available and prone to the physical loss of data compared to Standard-IA.
S3 Glacier and S3 Glacier Deep Archive: These storage classes are suitable for archiving data and have the lowest cost of all the above storage classes. Data in Glacier and Deep Glacier Archive is not readily available, and data retrieval can take from 1 minute to 12 hours in the case of Glacier and about 12 hours in the case of Deep Glacier Archive. These storage classes are suitable for data that rarely need to be accessed. A detailed analysis of S3 storage classes by AWS is given below:
S3 Storage Classes Comparison
For more details on S3 storage and classes, visit the S3 documentation.
Life Cycle Management
S3 provides transitioning objects into different storage classes as per the need to reduce costs. Life cycle configuration(a set of rules) is applied to the objects. S3 performs the actions as in the configurations. Two types of actions are performed:
Transition actions: These actions transition objects from one storage class to another after a specified time interval, e.g., after seven days or 15 days, etc. Additional costs are associated with life cycle transition requests.
Expiration actions: These actions delete the objects after the specified interval. For the configuration of life cycle management, it can either be set on a bucket via the management console, or one can provide an excel document having predefined actions(stored as an S3 sub-resource) to be performed on objects via API operations.
For more details on S3 life cycle management, visit the S3 documentation.
S3 Object Versioning
Another feature the S3 service provides is object versioning. Versions are enabled at the bucket level, and buckets with versioning enabled(disabled by default) have a version ID( auto-generated and noneditable) attached with each S3 object in that bucket. Objects present before versioning are assigned a null version ID, while new objects each have a unique version ID.
S3 versioning allows for storing multiple versions of an object and prevents an object’s accidental deletions and overwrites. It can also be used for archival purposes. The following images show the behavior of versioning for overwrites and delete:
The original version always remains in the bucket while a new copy of the same object is placed with a new and unique version ID that becomes the current version.
For deletes, a delete marker is placed on the object deleted, but in actuality, it is never deleted, and S3 returns a 404 error if the deleted object is retrieved.
For permanently deleting the object, version ID has to be passed along the request, which deletes the current version and sets the current version to the next most recent version.
S3 replication allows objects of one bucket to be copied to another bucket with the same or a different AWS account within the same or another region. Object replication is configured by providing a replication configuration to the source bucket consisting of at least the destination bucket and IAM role for S3 to copy the objects.
For object replication, we have cross-region object replication (CRR) and same region object replication (SRR). Other than data redundancy purposes, S3 replication can be used for various use cases such as:
Replication of objects with different storage classes. Object replication with different ownership.
S3 Replication Time Control (S3 RTC) to copy 99.99% of objects within the same or a different region in 15mins or less.
Some vital considerations to consider are: Object versioning must be enabled for source and destination buckets for replication. All objects present before the replication configuration set for the bucket are not copied, and we have to contact AWS support for their replication.
S3 Security and Encryption
S3 durability and reliability are one of its most attractive features providing 99.999999999% (eleven 9s)durability and 99.99% availability. Along with a highly secure and fault-tolerant infrastructure provided by AWS, data in S3 is replicated across multiple availability zones to sustain the concurrent loss of data in two AZs.
S3 also provides user-based and resource-based security options to restrict access to specific users, buckets, and objects.
User-based restriction: IAM policies can be set to define specific actions a user can perform on S3.
Resource-based restriction: Resource-based actions can be on buckets or objects within buckets for even finer control. These are done via bucket policies and access control lists (ACLs). By default, S3 buckets are private, and we specifically allow public access to the bucket and objects via the bucket policies and ACLs, respectively.
Data in S3 can be made more secure using the encryption options available in S3. Encryption can be either:
Server-Side: Unencrypted data is uploaded on S3 and encrypted first before physically stored by S3 and decrypted on access. The S3 service manages the keys used for encryption. S3 also provides options for the client to use its encryption keys in this case.
Client-Side: Data is encrypted at the client’s end and uploaded to S3 in encrypted form. The client is responsible for key management in this case.
S3 popular use cases
Some really popular use cases for AWS S3 can be used for data backups, data storage, and hybrid cloud storage. Media include images, music and videos, application hosting, and Static Website Hosting. Data archiving, Data Lakes building, and Data Analytics on Data Lakes, Disaster recovery.
The article summarizes the working model of S3 and the various features offered by the service. S3 is a foundational service that can be used for many different applications. The features provided in S3 are very lucrative, and the cost at which an infinite amount of data can be stored with high durability and accessibility makes it one of the most go-to services for any architecture.
Article Credits: Adit Modi
Adit is a Cloud, DevOps & Big Data Evangelist | 4x AWS Certified | 3x OCI Certified | 3x Azure Certified | AWS Community Builder | AWS Educate Cloud Ambassador.