Database

AllPages
RecentChanges
Links to this page
Edit this page
Search
Entry portal
Advice For New Users

This is a program that you give little chunks of data - like stores them in some hopefully-efficient format on disk, and then allows you to ask questions about them, like There is a remarkable amount of cleverness[->] involved in doing this safely and efficiently, even in the single-machine case.

You may have heard the term "big data": all it really means is "my database is now too big to fit on one machine", and in that sense, it's been around since the 1950s[->]. It's distinct from traditional supercomputing (also known as High-Performance Computation, or HPC), because in the HPC world the volume of input data, though often considerable, is of secondary concern to the vast number of calculations to be performed on it. With the advent of web-scale data (think "every link on every page on the Web"), big data is increasingly common. But if you can't fit all your data on one machine then you have to spread it across many machines, and that brings you into the world of distributed systems, which was best described by Dante Alighieri in 1300:

"Abandon all hope,
ye who enter here".

The theoretical limitations on what is possible[->] are bad enough, but the practice is much worse[->]. As the distributed systems researcher James Mickens puts it[->]:

"When you debug a distributed system or an OS kernel, you do it
Texas-style. You gather some mean, stoic people, people who have
seen things die, and you get some primitive tools, like a compass
and a rucksack and a stick that’s pointed on one end, and you walk
into the wilderness and you look for trouble, possibly while using
chewing tobacco."

Examples: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, Cassandra, Riak, MongoDB, Amazon RedShift, Google BigTable.

Next: BackendWebFramework


MilesGould writes:

My level of expertise: most of what I know about database internals comes from the link above, and from reading the lecture notes from the Edinburgh University databases course[->]. Here are some reading recommendations[->] that people were kind enough to give me on Twitter; I have yet to read most of them! On distributed systems, I'm even less well-informed: I've read a few blog posts, but I think this is a field where you really need to dive into the research literature, work through the maths and get your hands dirty. Here's a reading list[->] of important papers in the field compiled by the computer scientist Christopher Meiklejohn.


Lower level:
Root node:
Higher level:

Links to this page / Page history / Last change to this page
Recent changes / Edit this page (with sufficient authority)
All pages / Search / Change password / Logout