Episode 19 — Gather Requirements That Don’t Lie: Users, Storage, Objectives, and Constraints

In this episode, we’re going to focus on a skill that feels non-technical at first but quietly determines whether database systems succeed or fail: gathering requirements that reflect reality rather than wishful thinking. Beginners often assume database administration is mostly about choosing the right engine, creating tables, and tuning performance, but those choices only make sense when you know what the system is supposed to do, who depends on it, how much data it must handle, and what limits it must respect. Bad requirements are dangerous because they create false confidence; the system looks fine in early tests, then collapses when real users arrive, real data grows, and real constraints appear. For DataSys+, this matters because the exam expects you to reason about planning and operational fit, not only about mechanics, and planning begins with requirements. You will learn how to gather information about users and access patterns, how to think about storage and growth realistically, how to define objectives in measurable terms, and how to identify constraints that shape every decision. We will also talk about why people unintentionally provide misleading requirements, and how administrators can ask better questions to reveal the truth without being confrontational. The goal is not to turn you into a business analyst, but to give you a practical mental checklist that prevents the most common planning mistakes. By the end, you should be able to describe what good requirements look like and why good requirements make every technical decision easier and safer.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

A strong requirement-gathering mindset begins with the idea that requirements describe a system’s promised behavior under real conditions, not the story someone hopes to tell about the system. People often state requirements in vague language, such as fast, secure, or scalable, because those words feel reassuring, but they do not tell you what to build or how to measure success. A database administrator learns to translate vague language into concrete questions, like how many users will access the system at peak, what response time is acceptable for common queries, and what happens if the system is unavailable for ten minutes. This translation matters because a database design that supports a few dozen users may not support thousands, and a storage plan that works for a small dataset may fail when the dataset grows by an order of magnitude. Beginners sometimes feel uncomfortable asking detailed questions because they worry they are being difficult, but detailed questions are a form of safety because they prevent costly surprises later. Another key point is that requirements include both what the system should do and what it must not do, such as exposing sensitive data or losing updates. Good requirements reduce ambiguity, and reducing ambiguity reduces risk. If you think of requirements as guardrails for future decisions, you start to appreciate why they must be honest and specific. The mental model is that requirements are the map, and a map that lies leads you straight into a ditch.

Users are the first requirement category because users drive workload, and workload drives almost every technical choice. When we say users, we do not only mean humans clicking buttons; we also mean applications, integrations, scheduled jobs, reports, and background processes that interact with the database. Each type of user behaves differently, and those differences shape query patterns, concurrency levels, and write frequency. For example, an interactive user might run short queries and expect quick responses, while a reporting process might run heavy queries that scan large sets and can tolerate longer runtimes. An integration might write bursts of data during certain hours, and a maintenance job might run overnight and compete for resources if not scheduled carefully. Beginners often assume users are a single category, but a realistic system has multiple workload types, and the database must support them together. Requirement gathering should capture peak usage, not only average usage, because peak is where systems fail. It should also capture whether usage is steady or spiky, because spiky workloads can require different scaling strategies than steady workloads. Another user detail is access profile, meaning who needs read access, who needs write access, and who needs administrative privileges, because permissions and least privilege design depend on that. The mental model is that users are not just people; they are workload sources with different behaviors and different expectations.

Understanding how users will use the data requires you to capture access patterns, because access patterns determine what data structures and indexes will be effective. An access pattern is the typical way the system asks for data, such as fetching a record by identifier, searching by a date range, joining multiple tables for a report, or aggregating counts by category. Beginners sometimes collect requirements in terms of features, like the system must show customer history, but the database needs details like how often that history is fetched, how many rows it typically includes, and whether it must be sorted or filtered in specific ways. This is where set-based thinking becomes practical: if a common access pattern requires joining large tables and ordering results, performance planning becomes more important. If a common access pattern is simple lookups by key, the system may prioritize fast indexed retrieval and caching. Another access pattern consideration is whether queries are mostly predictable or ad hoc, because predictable queries can be optimized for specific shapes, while ad hoc queries require flexible indexing and careful resource controls. You also want to learn whether certain operations must be real-time or can be delayed, because delayed operations can be handled in background workflows that reduce peak pressure. Beginners often assume every feature must be instant, but real systems often benefit from separating urgent operations from non-urgent ones. When you capture access patterns honestly, you create a foundation for making database choices that fit reality. The mental model is that access patterns are the workload blueprint that the database must satisfy.

Storage requirements are another area where people unintentionally lie, not because they want to deceive you, but because they underestimate growth and they focus on what exists today instead of what will exist later. Storage is not only the current size of data; it is the combination of data volume, growth rate, retention needs, backups, logs, indexes, and overhead. Beginners sometimes treat storage as a single number, like the database is ten gigabytes, but a real storage plan must account for how quickly that number changes, what kinds of data are being stored, and how long the organization must keep it. Growth rate is especially important because a system that grows slowly can be planned differently than one that grows rapidly, and rapid growth can cause performance changes because indexes and scans behave differently as tables expand. Retention requirements can also be surprising, because regulations or business needs might require keeping historical data for years, which affects storage cost and query performance. Backups and logs often consume significant storage too, and if they are ignored, an environment can run out of space even when the main data files seem modest. Another storage detail is the shape of data, because wide rows with large text fields create different storage and I O needs than narrow rows with small numeric fields. A requirement that does not lie includes both current size and realistic projections, along with what kinds of data contribute to that size. The mental model is that storage is a moving target, and planning must follow the movement.

Objectives are the measurable outcomes the system must achieve, and this is where you turn vague goals into testable targets. If someone says the system must be fast, you ask what fast means for the most important operations, such as a typical lookup, a typical report, or a typical batch load. If someone says the system must be available, you ask how much downtime is acceptable and what kinds of downtime are acceptable, because a planned maintenance window is different from an unexpected outage during peak. If someone says the system must be reliable, you ask what data loss is acceptable during failure scenarios, because some systems can tolerate losing a few minutes of data while others cannot tolerate losing a single update. Beginners sometimes think objectives are only performance objectives, but objectives also include security objectives, such as limiting who can view sensitive fields, and operational objectives, such as how quickly issues must be detected and responded to. You can also think of objectives in terms of service level behavior, like response time targets and recovery targets, even if you do not use formal service level vocabulary. The key is that objectives should be specific enough to test, because a requirement that cannot be tested is an opinion, not a requirement. When objectives are clear, technical decisions become easier because you can evaluate whether a choice meets the objective. The mental model is that objectives are the success criteria, and without them, you cannot know if a design is good.

Constraints are the limits that shape what solutions are possible, and they are often where the truth hides because constraints force tradeoffs. Constraints can include budget constraints, staffing constraints, time constraints, and technology constraints, such as an organization standardizing on a particular platform. They can include regulatory constraints, such as data residency rules that require data to remain in a certain geographic location. They can include security constraints, such as requiring encryption, strong auditing, and strict access controls. They can also include operational constraints, such as limited maintenance windows or limited tolerance for complex migrations. Beginners sometimes view constraints as obstacles, but constraints are actually useful because they prevent you from designing an unrealistic system that cannot be operated. For example, a design that requires constant expert tuning may not be appropriate if the organization has limited database staffing. A design that assumes downtime for large schema changes may not be appropriate if the business requires near-continuous availability. The trick is to surface constraints early rather than discovering them during deployment. A requirement that does not lie is honest about what is not negotiable and what can be compromised. On the exam, constraints often appear as scenario details, and the correct answer usually respects those constraints rather than ignoring them. The mental model is that constraints are the boundaries of the solution space, and ignoring boundaries leads to failure.

Another category that deserves attention is data sensitivity, because requirements gathering must include what kind of data is stored and how sensitive it is. If the database will store personally identifiable information, financial data, health-related records, or other sensitive categories, that changes security requirements dramatically. It affects who can access the data, how it must be encrypted, how access must be logged, and how long data must be retained. It also affects incident response, because the consequences of a breach are higher, and the organization may have legal obligations. Beginners sometimes think of data sensitivity as a checkbox, but it is more like a design driver that shapes permissions, auditing, and even environment separation between development and production. For example, if production data is sensitive, test environments might need masked or synthetic data, which affects how data is copied and how developers work. Sensitivity also affects data deletion requirements, because some regulations require data to be deleted under certain conditions, while other regulations require data to be retained for a certain period, and those can conflict. Understanding sensitivity is also part of risk management, because it helps prioritize controls and monitoring. On the exam, sensitivity may appear indirectly in phrases about compliance, audit trails, or restricted access, and good answers usually include stronger controls in those scenarios. The mental model is that what data you store determines how careful you must be, and requirements must capture that.

When gathering requirements, you also want to capture failure scenarios explicitly, because systems are defined not only by how they work when everything is fine, but by how they behave when something goes wrong. A realistic requirement set includes questions like what happens if the primary database server fails, what happens if a region becomes unreachable, and what happens if a bad deployment introduces incorrect data. It also includes recovery expectations, such as how quickly the system must be restored and how much data can be lost in the worst case. Beginners sometimes assume failure planning is advanced, but it is basic operational maturity because failures are normal events in complex systems. Another important failure scenario is human error, such as accidental deletion or running a change against the wrong environment, because many incidents come from mistakes rather than from hardware failure. This is where requirements about backups, point-in-time recovery, and audit trails become concrete, because you are planning for realistic mistakes. Capturing failure scenarios also forces clarity about tradeoffs, because perfect resilience is expensive and complex, and organizations must choose what level of resilience is necessary. On the exam, you may see questions where the best answer depends on recognizing the required recovery behavior. If the requirement is to recover quickly with minimal loss, you choose a design that supports that, while if the requirement tolerates longer recovery, you might choose simpler options. The mental model is that failure scenarios are part of the expected operating environment, so requirements must include them.

People often provide misleading requirements because they answer based on what they know today, what they wish were true, or what they think you want to hear, and understanding this helps you gather better information without being cynical. A product owner might underestimate peak usage because they only see current traffic and do not anticipate growth. A developer might describe a query as simple because it looks simple in code, not realizing the database work it creates. A manager might say the system must never go down because that sounds strong, but when pressed, they might accept planned maintenance or brief outages. A compliance team might list many controls without clarifying which are legally required versus which are preferred. A beginner might think requirement gathering is about taking notes, but it is more like careful interviewing, where you ask follow-up questions that turn vague statements into measurable claims. You can also validate requirements by comparing them to evidence, such as current usage logs, current data growth, and known business cycles, because evidence reduces guesswork. Another technique is to ask for examples, like describing a typical day, a peak day, and a worst-case day, because real usage patterns often vary. Requirements that do not lie often emerge from this combination of questions and evidence. For DataSys+, this mindset matters because administrators must be able to plan based on reality, not based on slogans. The mental model is that requirements gathering is a truth-finding process, and truth often appears in details.

Bringing these threads together, you can see that honest requirements gathering is about describing users and workloads accurately, describing storage and growth realistically, defining objectives in measurable terms, and identifying constraints and sensitivities that shape every design choice. Users include humans and systems, and their access patterns determine which query shapes, indexes, and scaling approaches will work. Storage is not a single number; it includes growth, retention, backups, and overhead that can surprise you if ignored. Objectives define what success looks like and must be testable so you can evaluate designs and operations. Constraints and sensitivity define what is possible and what must be protected, including security controls, compliance needs, staffing realities, and downtime tolerance. Failure scenarios are part of requirements because resilience and recovery behavior must be chosen deliberately rather than discovered during an outage. Finally, gathering requirements that do not lie requires asking better questions, using evidence, and translating vague language into concrete, measurable statements. If you can do that, you make every downstream technical decision safer because you are designing for the world you actually have. This sets you up for the next episode, because once requirements are clear, you can make a more rational decision about cloud versus on-premises environments, weighing cost, control, and operational fit against those real needs. Requirements are the foundation that keeps design honest, and honest design is the first step toward dependable systems.

Episode 19 — Gather Requirements That Don’t Lie: Users, Storage, Objectives, and Constraints
Broadcast by