What is a Database?
Databases allow for the storage of structured information in a way that is easy to retrieve. Every app you use is powered by databases. Think of them like spreadsheets optimized for billions of operations.
Ever wondered how the scan at the grocery store's cashier gets you the item's price? Basically, the infrared scanner translates the information encoded in the bar code into a digital identifier, then the computer associates that identifier with a database entry which contains the item and its attributes such as name, price, etc.
We will get to how computers interact with databases. What matters more, for now, is its contents: the table which contains Sushi.
Yes, this is pretty much what a database is: a way to store spreadsheets. That simple. But then, why are database companies worth billions of dollars? Why do database developers earn way over six figures? Turns out databases are both very simple in essence, but, at large scales, face extremely complex challenges, as we'll discover.
Databases are Super-Powered Spreadsheets
I assume you know Microsoft Excel, the most popular spreadsheet app. When you open Excel, you essentially see a database table:
Databases allow for the storage of structured information in a way that is easy to retrieve. Every app you use is powered by databases. Your Facebook profile is a row in a gigantic database that contains attributes such as your 'Name', 'Age', 'Gender', 'Location', 'Martial Status', etc. So, then, Facebook runs on Excel?
Not so fast, cowboy. Try to recall the last time you worked with colleagues on a shared Excel or Google Sheets. You most likely met two huge challenges: one person modifying a field before you had finished a calculation, and the awful slowness. Add on top of this that Excel files are voluminous: 1 million rows of data require around 88 megabytes. If you only took the simple Facebook profile row I've described (in reality this is of course much more complex) for all its 3 billion users, you would get a 264 gigabytes file, which only a super computer can handle.
The really amazing trait of databases is the algorithms that allow storing the data in a way that optimizes computer memory and disk usage, by providing resilience and scalability to trillions of records (rows). Here are a few examples:
- Database systems cleverly use memory to store the most used values for quick retrieval, this is called "caching".
- Mathematically-optimized algorithms read and write information with the fewest computing power.
- Internal replication properties allow a disk to crash while keeping the information.
- "Locking" properties keep operations in line (you wouldn't want a database transaction to finish without your paycheck being fully registered in your bank account, or by having another computer program modify it).
The most well-known and widely-used databases are relational databases, which consist of interlinked tables. Think back to the Facebook profile. All your Facebook information cannot be stored in a single spreadsheet. It would look more like this:
Tables are linked with unique identifiers. If I want to 'Like' or 'Friend' someone, separate tables can be updated, thus making it easier for computers to handle. It's easier to update smaller spreadsheets rather than everybody working on a giant file!
With that model, to ask questions such as "What does Pierre-Paul like?", you need to query the database. The language you need to master? The "Structured Query Language" (SQL, or "sequel").
SQL Like You're Five
Learning SQL can be compared to learning the guitar. With the four or five basic chords, you can play the majority of popular songs. In the database world, a handful of statements can allow anybody to query data. It's as easy as applying filters on an Excel table! But mastery, just like a guitar, takes an incredible amount of effort. The syntax would like like this if I wanted to ask the database: "What is Pierre-Paul Ferland's location?
SELECT Location
FROM 'Facebook Users'
WHERE Name = 'Pierre-Paul Ferland';
The syntax is quite close to actual human language. The queries rise in complexity when you have to gather information from multiple tables or when you need to update (write) the database. Some queries even allow you to create new databases, new tables, users and access policies!
SQL has been around for the better part of 50 years, and SQL masters are still in high demand. However, the rise of generative AI carries immense promise, as a ChatGPT could allow anybody to ask the database questions with normal human language, rather than learning SQL. Imagine if you could play a guitar song by simply humming the melody!
If AI figured out SQL, then database specialists are obsolete, right? Well, databases still carry many challenges, the biggest of which, I will argue, is security
Why is database security so important and so hard?
Databases store the most valuable information any organization holds. Any threat actor seeks to gain unauthorized access to them. Here are examples of cybercrime involving databases:
- Altering a bank database to steal money or property.
- Students changing their grades in their college's databases.
- Asking for a ransom from hospitals in order not to disclose health information publicly.
With the move to public cloud computing, databases are now hosted in data centers accessible from the Internet. Human error has therefore become the leading cause of database leaks. By simply checking a box, a careless administrator can expose a database to the public!
Solutions all involve a "defence in depth" strategy that includes tight access controls, monitoring, and, with cloud databases, specific configuration policies that prevent accidental exposure.
Data health and data quality are one of, if not the biggest, challenges facing large organizations due to any or all of the following:
- Data was amassed in siloed databases,
- Enterprises grew by acquisition, with subsidiaries having their own data model,
- Databases were modelled based on vendor limitations,
- Special characters are being handled differently (I have a hyphen in my name, half of the banks have trouble dealing with that!)
To avoid extremely costly and error-prone database migrations, data often needs to be duplicated (or "mirrored") on intermediate databases who are tasked to keep the data in sync between the legacy systems. Such proliferation of data becomes one of the biggest information security risks of companies because it is extremely hard to keep track of data flows and to apply coherent access policies on databases.
All this to say, even if AI can master SQL, database administrators, developers and security specialists still have lots of job to do to keep databases healthy and secured.
Latest In Tech
Privacy and Cybersecurity
- Infamous hacker pompompurin arrested. I wrote about Purin's hacking of poop delivery service ShitExpress (I shit you not) as part of a hacker rivalry. "pom" was running BreachForums, an online marketplace for breached data. The man was also involved in attacks on the FBI itself as well ac DC DataLink. Kudos to the FBI! I guess "pom" was "pwned! Story
Business of Tech
- President Biden sends an ultimatum to ByteDance: Chinese owners sell stakes in TikTok or they ban the app. Alright, let's get this out of the way: I'm as annoyed as anyone seeing politicians and right-minded people making judgments on an app they've clearly never used. However, concerns over Chinese government intervention in the app remain real: ByteDance would be compelled to hand over any user data to the authorities and TikTok could feed micro-targeted content to vulnerable users in order to shape their political beliefs. I'm all for using the biggest political powers in order to mitigate China's influence on TikTok. That said, I'm against a ban. Enforcing a ban on TikTok would require the governments to act pretty much like China: censor the network at the ISP level, compel Apple and Google to remove the apps from their marketplaces, and somehow crack down on domestic VPN use and third-party apps marketplace. Do we really want teenagers to use shady VPNs and app stores to get their TikTok fix? I actually believe what we are doing right now, by raising the concerns and yelling out loud at China, is what we need to do. We want creators (TikTok's product) and advertisers (TikTok's customers) to devalue the app based on this reputation. If politics don't cut it, market pressure may be the final push Beijing needs to surrender its ties to the app. Story
- Apple Joins the ChatGPT trends, with Siri. In the same week that the NYT posted an article about how digital assistants such as Google Assistant, Siri and Alexa failed, a news leak of Apple working on a more intelligent Siri that will be able to converse with users. The most interesting element for me is that, after almost a decade, most people only use Siri to do basic things like opening an app. I think Apple forgets one thing in all this: people don't want to talk to Siri while jammed in the metro, no matter how smart it gets. Story
- Amazon going at Google with a new web browser. The news is based on a survey Amazon sent to users. It's likely Amazon saw the "success" of Apple's anti-tracking measures and it wants to eat away further at Google's ad share. Amazon has a $30 Billion ad business, most of which is from sellers who want their products to appear on top of search results. It is therefore immune to anti-tracking measures. Story
Artificial Intelligence
- While GPT-4 takes all the spotlight, AI Image generator MidJourney Launches v5 and people are freaking out! The images now look so realistic that fakes of celebrities and politicians will run rampant! See below for the different portraits I made of Jennifer Lopez as a medieval bard. The first one from v4 does an excellent job, but the v5 one works much better, as instead of working from some type of "prototype" of J-Lo, it seems to start from what Lopez actually looks like these days. MidJourney had been harshly criticized for having trouble with drawing hands and I can't help but feel that it put fingers in my generation just to show off (and give the finger to critics, ah-ha!) Story
❓ Question of the Week
Which concept would you prefer me to explain next?
If you like my content, subscribe to the newsletter with the form below.
Cheers,
PP