NoSQL - Document, Key-Value, Column, Graph - Databases

Twitter הגיע ל-50 מיליון tweets ביום. MySQL היה ה-database שלהם, והוא התחיל להיות병목. הבעיה לא הייתה MySQL עצמה - הייתה הarchitecture. Timeline של user דורש select כל tweets של כל following, sort, paginate. זה היה JOIN מורכב על עשרות מיליוני שורות. PostgreSQL ו-MySQL לא בנויים ל-write throughput בקנה מידה זה.

הפתרון שלהם: Redis ל-caching של timelines, Cassandra ל-append-only tweet storage, MySQL ל-user accounts. שלושה מסדי נתונים, כל אחד עבור ה-job שלו. NoSQL לא נולד מ-"SQL גרוע" - הוא נולד מ-scale requirements ו-query patterns שSQL לא יעיל עבורם.

ה-2009-2015 hype cycle של NoSQL גרם לנזק מסוים: startups בחרו MongoDB, Cassandra, CouchDB "כי זה מה שעושים", לא כי הבינו את ה-tradeoffs. כמה שנים אחר כך, כשניסו לעשות transactions בין collections או JOINs מורכבים, גילו שה-"flexibility" עלתה במחיר גבוה. רשת של companies בישראל עברה מ-MongoDB ל-PostgreSQL בין 2015-2020 - לא כי MongoDB גרוע, אלא כי ה-use case שלהם (financial data, complex reporting) פשוט לא התאים.

PostgreSQL הוסיפה JSONB עם full indexing, partitioning, ו-pgvector. NewSQL databases כמו CockroachDB הראו שdistributed SQL אפשרי. ב-2025, הבחירה היא לא "SQL vs NoSQL" - היא "מה ה-access pattern שלי ומה ה-consistency requirement?".

Amazon, שבנו את DynamoDB כי היו להם write requirements שMySQL לא יכלה לפגוש, עדיין מריצים PostgreSQL ו-MySQL לחלקים גדולים מה-business שלהם. DynamoDB בנוי לscale שאפילו גדול מ-Amazon ב-2008 לא הגיע אליו. זה לא benchmark לסטארטאפ שמתחיל - זה lesson שכלים צריכים להתאים לבעיה, לא להפך.

SQL vs NoSQL - the honest tradeoff

Same data, different physics - pick by access pattern, not by hype.

SQL (PostgreSQL, MySQL)

ACID, relations, declarative

Strong consistency + multi-row transactions

Foreign keys + constraints enforced by engine

Mature query planner - declarative queries

Harder horizontal scaling past ~1 node

Schema migrations require coordination

NoSQL (Mongo, Cassandra, Dynamo)

Scale, flexibility, denormalized

Horizontal scale built-in

Flexible schema - write any shape

High write throughput (Cassandra/Dynamo)

Eventual consistency - race conditions easier

No JOINs - denormalize or app-side join

Twitter הגיע ל-50 מיליון tweets ביום. MySQL היה ה-database שלהם, והוא התחיל להיות병목. הבעיה לא הייתה MySQL עצמה - הייתה הarchitecture. Timeline של user דורש select כל tweets של כל following, sort, paginate. זה היה JOIN מורכב על עשרות מיליוני שורות. PostgreSQL ו-MySQL לא בנויים ל-write throughput בקנה מידה זה.