IntermediateLesson 4 of 12

Database Services

Master Cloud SQL, BigQuery, Firestore, and Cloud Spanner for every database pattern.

Simple Explanation (ELI5)

GCP offers databases for different workloads: Cloud SQL is for traditional apps (MySQL, PostgreSQL, SQL Server). BigQuery is for analyzing huge amounts of data (petabytes). Firestore is for mobile/web apps with real-time sync. Cloud Spanner is a globally distributed database that scales like NoSQL but works like SQL. Pick based on your data type and query patterns.

Why Multiple Databases?

Technical Explanation

1. Cloud SQL

Managed MySQL, PostgreSQL, or SQL Server. Automated backups, replicas, failover, patches. ACID transactions. Relational schema.

bash
# Create a Cloud SQL instance
gcloud sql instances create my-db \
  --database-version MYSQL_8_0 \
  --tier db-f1-micro \
  --region us-central1

# Create a database
gcloud sql databases create mydb --instance=my-db

# Create a user
gcloud sql users create myuser --instance=my-db --password=PASSWORD

# Connect from Cloud Shell
gcloud sql connect my-db --user=root

2. BigQuery

Serverless data warehouse. SQL queries on petabyte-scale datasets in seconds. Pay only for bytes scanned. Built-in ML.

bash
# Load data into BigQuery
bq load --source_format=CSV my_dataset.my_table file.csv

# Run a query
bq query --use_legacy_sql=false '
  SELECT COUNT(*) FROM `project.dataset.table`
  WHERE date >= "2024-01-01"'

# Stream data into BigQuery
bq insert my_dataset.my_table '[{"timestamp":"2024-01-01","value":100}]'

3. Firestore

Document database with real-time sync. Mobile-first. Scales automatically. Collections and documents (JSON-like).

bash
// Write to Firestore
db.collection("users").doc("user123").set({
  name: "Alice",
  email: "alice@example.com"
})

// Real-time listener
db.collection("users").doc("user123")
  .onSnapshot(doc => console.log(doc.data()))

// Query
db.collection("users")
  .where("age", ">", 18)
  .limit(10)
  .get()

4. Cloud Spanner

Globally distributed relational database. ACID transactions across regions. Linearizable consistency. Expensive but eliminates replication lag.

bash
# Create a Spanner instance
gcloud spanner instances create my-instance \
  --config=regional-us-central1 \
  --nodes=1 \
  --display-name="My Instance"

# Create a database
gcloud spanner databases create mydb \
  --instance=my-instance \
  --ddl-file=schema.sql

Database Comparison

DatabaseTypeScaleLatencyUse Case
Cloud SQLRelationalTerabytesLowTraditional apps
BigQueryAnalyticsPetabytesSecondsData analysis
FirestoreDocumentTerabytesReal-timeMobile/web apps
Cloud SpannerRelationalPetabytesLowGlobal ACID

Interview Questions

Beginner

What is Cloud SQL and when should I use it?

Cloud SQL is a managed MySQL, PostgreSQL, or SQL Server instance. Use it for any app that needs a relational database (web backends, CRM, ERP). GCP handles patching, backups, replicas. Much easier than managing a VM-based database.

What is BigQuery used for?

BigQuery is for analyzing massive datasets quickly. It runs SQL queries in parallel across hundreds of nodes. Perfect for: business intelligence, log analysis, data science. You pay per byte scanned, not per instance.

What is Firestore?

Firestore is a managed document database with real-time sync. Collections contain documents (JSON-like). Changes sync instantly to all connected clients. Perfect for mobile/web apps, chat apps, collaborative tools.

When would I use Cloud Spanner over Cloud SQL?

Cloud SQL handles one region. Cloud Spanner replicates globally with strong consistency. If you need transactions across regions with zero replication lag, use Spanner. Otherwise, Cloud SQL is simpler and cheaper.

Can I migrate from MySQL to BigQuery?

Yes. Cloud SQL is for operational databases (OLTP). BigQuery is for analysis (OLAP). You export/load data from SQL into BigQuery. They serve different purposes; many apps use both.

Intermediate

How does Cloud SQL handle high availability?

Regional high availability creates a standby replica in a different zone. If primary fails, failover is automatic (usually 1-3 minutes). Backups occur automatically and can be restored to any point in time (PITR).

Why is BigQuery pricing based on bytes scanned?

BigQuery partitions and clusters data for efficiency. Queries only scan relevant partitions. Pricing encourages good schema design. You pay $6.25 per TB scanned. Using appropriate WHERE clauses and partition pruning significantly reduces cost.

What is the tradeoff between Firestore and Cloud Datastore?

Datastore is legacy. Firestore is the new document database with better performance, offline sync, and firestore-in-datastore compatibility mode. If starting fresh, always use Firestore.

Real-world Scenarios

Scenario 1: E-commerce Backend

Use Cloud SQL for transactional data (orders, inventory). Use BigQuery for analysis (sales trends, customer behavior). Real-time sync with Firestore for shopping cart.

Scenario 2: SaaS Analytics

Events streaminto BigQuery continuously. Analytic queries return results in seconds on billions of rows. Use Datafusion or Looker to visualize insights.

Summary

Cloud SQL covers traditional app database needs. BigQuery dominates analytics. Firestore is for real-time, mobile-first apps. Cloud Spanner suits global operations requiring strong consistency. Most enterprise apps use multiple databases for different workloads.