SKILLY
HomeGCPBack to Course
Basics
1 GCP Fundamentals 2 Compute Services 3 Storage Services
Intermediate
4 Database Services 5 Networking 6 Security and IAM
Advanced
7 High Availability and Scaling 8 Cost Management 9 Azure vs AWS vs GCP
Hands-on
10 Real-world Scenarios 11 Troubleshooting 12 Interview Preparation
BeginnerLesson 3 of 12

Storage Services

Master Cloud Storage for unstructured data, Persistent Disk for VM storage, and Filestore for shared file systems.

Simple Explanation (ELI5)

GCP offers three main storage types: Cloud Storage is like a giant filing cabinet for any files (images, videos, backups). Persistent Disks are hard drives attached to VMs. Filestore is a shared network drive that multiple VMs can access at once. Pick the right tool based on how many VMs need access and whether data should survive the VM.

Why Do We Need Different Storage Types?

  • Cloud Storage: Infinitely scalable, cheap, perfect for archives, backups, and static artifacts
  • Persistent Disk: Fast, tied to a VM or shared within a zone, for databases and high-performance apps
  • Filestore: NFS-compatible shared file system, for apps needing centralized storage

Technical Explanation

1. Cloud Storage (Object Storage)

Unstructured data (objects) stored in buckets. No hierarchy, just key-value pairs. Designed for durability, availability, and massive scale.

bash
# Create a bucket
gsutil mb gs://my-unique-bucket-name

# Upload content
gsutil cp file.txt gs://my-bucket/

# List bucket contents
gsutil ls gs://my-bucket/

# Download an object
gsutil cp gs://my-bucket/file.txt local-file.txt

# Delete an object
gsutil rm gs://my-bucket/file.txt

# Set bucket lifecycle (delete old objects after 30 days)
gsutil lifecycle set - <

Cloud Storage Classes

ClassRetrievalCostUse Case
StandardInstant$0.020/GBFrequently accessed data
NearlineInstant$0.010/GB30+ day infrequent access
ColdlineInstant$0.004/GB90+ day rare access
ArchiveHours$0.0012/GBLong-term backups, compliance
??
Best Practice

Use lifecycle policies to automatically move old data from Standard to cheaper tiers. For data accessed weekly, use Nearline. For monthly, use Coldline. This can reduce costs by 80%.

2. Persistent Disk

Block storage attached to VMs. High IOPS, low latency. Available as Standard (HDD) or SSD. Can be zonal (single zone) or regional (replicated across zones).

bash
# Create a persistent disk
gcloud compute disks create my-disk \
  --size 100GB \
  --zone us-central1-a \
  --type pd-standard

# Attach to a VM
gcloud compute instances attach-disk my-vm \
  --disk my-disk \
  --zone us-central1-a

# Format and mount (from inside VM)
sudo mkfs.ext4 -F /dev/sdd
sudo mkdir -p /mnt/data
sudo mount /dev/sdd /mnt/data

Regional Persistent Disk

Replicated across zones in a region. Provides higher availability and zero-downtime data replication. Slightly more expensive than zonal.

3. Filestore (Managed NFS)

Shared file system accessible from multiple VMs over NFS. Useful for shared application data, machine learning workloads, and legacy apps expecting POSIX file systems.

bash
# Create a Filestore instance
gcloud filestore instances create my-filestore \
  --zone=us-central1-a \
  --tier=standard \
  --capacity=100GB \
  --file-share=name="share",capacity="100GB"

# Mount from a VM (get the export path from describe)
gcloud filestore instances describe my-filestore --zone us-central1-a

# Mount (as root on VM)
sudo mkdir -p /mnt/filestore
sudo mount -t nfs 10.x.x.x:/myshare /mnt/filestore

Storage Comparison

ServiceTypeAccessDurabilityCost
Cloud StorageObjectHTTP/API99.999999999%Lowest (tiered)
Persistent DiskBlockVM attachment99.999% (zonal)Medium
FilestoreFileNFS/POSIX99.9%Higher

Best Practices

  • Cloud Storage: Use versioning for critical data, enable object lock for compliance, use signed URLs for temporary access.
  • Persistent Disk: Use snapshots for backups, encrypt with customer-managed keys if required.
  • Filestore: Reserve capacity in advance, use VPC-peering for high throughput, backup to Cloud Storage regularly.

Interview Questions

Beginner

What is Cloud Storage and how does it compare to S3??

Cloud Storage is GCP's object storage. Both are similar in pricing and features. Cloud Storage bucket names are globally unique; requires project. S3 uses regions; requires AWS account. Syntax differs (gs:// vs s3://), but concepts are equivalent.

When should I use Nearline vs Standard storage??

Use Standard for data accessed daily/weekly. Use Nearline if accessed monthly. Cost drops 50% but retrieval speed is the same. The trade-off is a minimum storage duration (30 days for Nearline) �" deleting earlier incurs an early deletion fee.

What is a Persistent Disk??

Block storage attached to a VM. Like a hard drive in your computer, but in the cloud. It persists even if the VM is stopped. SSD (pd-ssd) is fast; Standard (pd-standard) is cheaper but slower.

What is Filestore??

Managed NFS file system. Multiple VMs can mount and access the same files simultaneously. Useful for shared application data or legacy apps needing POSIX file system semantics.

Can a Persistent Disk be attached to multiple VMs??

Not easily. A zonal disk can only attach to one VM. Regional disks can attach to multiple VMs in the same region but require filesystem coordination. Filestore is the better choice for multi-VM shared storage.

Intermediate

How would you backup a Persistent Disk??

Create a snapshot: gcloud compute disks snapshot --zone us-central1-a. Snapshots are incremental and stored in Cloud Storage. Restore by creating a new disk from snapshot.

What is the cost implication of Cloud Storage lifecycle policies??

Moving data from Standard to Archive (after 90 days) reduces storage cost by 94%. However, if you delete Archive data before 365 days, you pay an early deletion fee. Lifecycle policies are pennies to administer but save millions for enterprises.

Why use Regional Persistent Disk over Zonal??

Regional replicates across zones, providing automatic failover if a zone goes down. Zonal is cheaper but fails entirely if the zone fails. For production databases, regional is standard.

Real-world Scenarios

Scenario 1: Data Pipeline with Archive

New data lands in Standard storage. After 30 days, lifecycle policy moves it to Nearline. After 90 days, move to Coldline. After 365 days, move to Archive. Reduces costs while keeping data queryable (BigQuery can query Archive directly).

Scenario 2: Database with Snapshots

Database runs on VM with SSD Persistent Disk. Take daily snapshots for backup. If disk corruption, restore from snapshot in minutes.

Summary

Cloud Storage is the cheapest, most durable option for unstructured data. Persistent Disk provides block storage for VMs. Filestore enables shared file systems for multi-VM applications. Use lifecycle policies aggressively to move cold data to Archive; storage cost is often the biggest cloud bill and most reducible.

PreviousCompute Services? Back to CourseNextDatabase Services