Editing Database
You are currently not logged in.
To change this, fill in the following fields:
Username
Password
Who can read this page?
The World
Members
Council
Admin
You have been granted an edit lock on this page
until Sun May 19 09:48:30 2024.
Press
to finish editing.
Who can edit this page?
... world editing disabled
Members
Council
Admin
This is a program that you give little chunks of data - like * "Miles left a comment on post 12532132 at 16:32 with the text 'Hello, World!'", stores them in some hopefully-efficient format on disk, and then allows you to ask questions about them, like * "who commented on post 12532132, and what did they say?". There is a [remarkable amount of cleverness](http://coding-geek.com/how-databases-work/) involved in doing this safely and efficiently, even in the single-machine case. You may have heard the term "big data": all it really means is "my database is now too big to fit on one machine", and in that sense, it's been around [since the 1950s](http://blog.jgc.org/2012/10/the-great-railway-caper-big-data-in-1955.html). It's distinct from traditional supercomputing (also known as High-Performance Computation, or HPC), because in the HPC world the volume of input data, though often considerable, is of secondary concern to the vast number of calculations to be performed on it. With the advent of web-scale data (think "every link on every page on the Web"), big data is increasingly common. But if you can't fit all your data on one machine then you have to spread it across many machines, and that brings you into the world of distributed systems, which was best described by Dante Alighieri in 1300: |>> [[[ "Abandon all hope, _ ye who enter here". ]]] <<| The [theoretical limitations on what is possible](https://en.wikipedia.org/wiki/CAP_theorem) are bad enough, but [the practice is much worse](https://aphyr.com/tags/jepsen). As the distributed systems researcher James Mickens [puts it](https://www.usenix.org/system/files/1311_05-08_mickens.pdf): |>> [[[ "When you debug a distributed system or an OS kernel, you do it _ Texas-style. You gather some mean, stoic people, people who have _ seen things die, and you get some primitive tools, like a compass _ and a rucksack and a stick that’s pointed on one end, and you walk _ into the wilderness and you look for trouble, possibly while using _ chewing tobacco." ]]] <<| Examples: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, Cassandra, Riak, MongoDB, Amazon RedShift, Google BigTable. Next: BackendWebFramework ---- MilesGould writes: My level of expertise: most of what I know about database internals comes from the link above, and from reading the [lecture notes from the Edinburgh University databases course](http://www.inf.ed.ac.uk/teaching/courses/adbs/#slides). [Here are some reading recommendations](http://pozorvlak.dreamwidth.org/179039.html) that people were kind enough to give me on Twitter; I have yet to read most of them! On distributed systems, I'm even less well-informed: I've read a few blog posts, but I think this is a field where you really need to dive into the research literature, work through the maths and get your hands dirty. Here's a [reading list](http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html) of important papers in the field compiled by the computer scientist Christopher Meiklejohn.