What is a Database?
Databases allow for the storage of structured information in a way that is easy to retrieve. Every app you use is powered by databases. Think of them like spreadsheets optimized for billions of operations.

Ever wondered how the scan at the grocery store's cashier gets you the item's price? Basically, the infrared scanner translates the information encoded in the bar code into a digital identifier, then the computer associates that identifier with a database entry which contains the item and its attributes such as name, price, etc.

We will get to how computers interact with databases. What matters more, for now, is its contents: the table which contains Sushi.
Yes, this is pretty much what a database is: a way to store spreadsheets. That simple. But then, why are database companies worth billions of dollars? Why do database developers earn way over six figures? Turns out databases are both very simple in essence, but, at large scales, face extremely complex challenges, as we'll discover.
Databases are Super-Powered Spreadsheets
I assume you know Microsoft Excel, the most popular spreadsheet app. When you open Excel, you essentially see a database table:

Databases allow for the storage of structured information in a way that is easy to retrieve. Every app you use is powered by databases. Your Facebook profile is a row in a gigantic database that contains attributes such as your 'Name', 'Age', 'Gender', 'Location', 'Martial Status', etc. So, then, Facebook runs on Excel?
Not so fast, cowboy. Try to recall the last time you worked with colleagues on a shared Excel or Google Sheets. You most likely met two huge challenges: one person modifying a field before you had finished a calculation, and the awful slowness. Add on top of this that Excel files are voluminous: 1 million rows of data require around 88 megabytes. If you only took the simple Facebook profile row I've described (in reality this is of course much more complex) for all its 3 billion users, you would get a 264 gigabytes file, which only a super computer can handle.
The really amazing trait of databases is the algorithms that allow storing the data in a way that optimizes computer memory and disk usage, by providing resilience and scalability to trillions of records (rows). Here are a few examples:
- Database systems cleverly use memory to store the most used values for quick retrieval, this is called "caching".
- Mathematically-optimized algorithms read and write information with the fewest computing power.
- Internal replication properties allow a disk to crash while keeping the information.
- "Locking" properties keep operations in line (you wouldn't want a database transaction to finish without your paycheck being fully registered in your bank account, or by having another computer program modify it).
The most well-known and widely-used databases are relational databases, which consist of interlinked tables. Think back to the Facebook profile. All your Facebook information cannot be stored in a single spreadsheet. It would look more like this:

Tables are linked with unique identifiers. If I want to 'Like' or 'Friend' someone, separate tables can be updated, thus making it easier for computers to handle. It's easier to update smaller spreadsheets rather than everybody working on a giant file!
With that model, to ask questions such as "What does Pierre-Paul like?", you need to query the database. The language you need to master? The "Structured Query Language" (SQL, or "sequel").
SQL Like You're Five
Learning SQL can be compared to learning the guitar. With the four or five basic chords, you can play the majority of popular songs. In the database world, a handful of statements can allow anybody to query data. It's as easy as applying filters on an Excel table! But mastery, just like a guitar, takes an incredible amount of effort. The syntax would like like this if I wanted to ask the database: "What is Pierre-Paul Ferland's location?
SELECT Location
FROM 'Facebook Users'
WHERE Name = 'Pierre-Paul Ferland';
The syntax is quite close to actual human language. The queries rise in complexity when you have to gather information from multiple tables or when you need to update (write) the database. Some queries even allow you to create new databases, new tables, users and access policies!
SQL has been around for the better part of 50 years, and SQL masters are still in high demand. However, the rise of generative AI carries immense promise, as a ChatGPT could allow anybody to ask the database questions with normal human language, rather than learning SQL. Imagine if you could play a guitar song by simply humming the melody!
If AI figured out SQL, then database specialists are obsolete, right? Well, databases still carry many challenges, the biggest of which, I will argue, is security
Why is database security so important and so hard?
Databases store the most valuable information any organization holds. Any threat actor seeks to gain unauthorized access to them. Here are examples of cybercrime involving databases:
- Altering a bank database to steal money or property.
- Students changing their grades in their college's databases.
- Asking for a ransom from hospitals in order not to disclose health information publicly.
With the move to public cloud computing, databases are now hosted in data centers accessible from the Internet. Human error has therefore become the leading cause of database leaks. By simply checking a box, a careless administrator can expose a database to the public!
Solutions all involve a "defence in depth" strategy that includes tight access controls, monitoring, and, with cloud databases, specific configuration policies that prevent accidental exposure.
Data health and data quality are one of, if not the biggest, challenges facing large organizations due to any or all of the following:
- Data was amassed in siloed databases,
- Enterprises grew by acquisition, with subsidiaries having their own data model,
- Databases were modelled based on vendor limitations,
- Special characters are being handled differently (I have a hyphen in my name, half of the banks have trouble dealing with that!)
To avoid extremely costly and error-prone database migrations, data often needs to be duplicated (or "mirrored") on intermediate databases who are tasked to keep the data in sync between the legacy systems. Such proliferation of data becomes one of the biggest information security risks of companies because it is extremely hard to keep track of data flows and to apply coherent access policies on databases.
All this to say, even if AI can master SQL, database administrators, developers and security specialists still have lots of job to do to keep databases healthy and secured.