Database |
|
You may have heard the term "big data": all it really means is "my database is now too big to fit on one machine", and in that sense, it's been around since the 1950s[->]. It's distinct from traditional supercomputing (also known as High-Performance Computation, or HPC), because in the HPC world the volume of input data, though often considerable, is of secondary concern to the vast number of calculations to be performed on it. With the advent of web-scale data (think "every link on every page on the Web"), big data is increasingly common. But if you can't fit all your data on one machine then you have to spread it across many machines, and that brings you into the world of distributed systems, which was best described by Dante Alighieri in 1300:
|
The theoretical limitations on what is possible[->] are bad enough, but the practice is much worse[->]. As the distributed systems researcher James Mickens puts it[->]:
|
Examples: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, Cassandra, Riak, MongoDB, Amazon RedShift, Google BigTable.
Next: BackendWebFramework
MilesGould writes:
My level of expertise: most of what I know about database internals comes from the link above, and from reading the lecture notes from the Edinburgh University databases course[->]. Here are some reading recommendations[->] that people were kind enough to give me on Twitter; I have yet to read most of them! On distributed systems, I'm even less well-informed: I've read a few blog posts, but I think this is a field where you really need to dive into the research literature, work through the maths and get your hands dirty. Here's a reading list[->] of important papers in the field compiled by the computer scientist Christopher Meiklejohn.
Lower level: |
Root node: |
Higher level: |