Tyler's Blog

Just another data blog.

Author: tygar

  • Be Explicit in Production

    When building SQL queries it can be tempting to lean on shortcuts. There’s a certain convenience in writing SELECT * instead of listing every column you actually need. It feels fast, flexible, and harmless. This is especially true during development, when schemas change frequently and you just want to explore your data. But habits formed in development tend to follow us into production, and once they do, these “harmless” shortcuts can become liabilities.

    The Hidden Cost of Being Implicit

    The trouble with non-explicit queries is that they rely on the current shape of the data rather than the intended shape. Which is fine, until something changes. Maybe a column gets added, renamed, or removed. Maybe data comes from a different environment than the one you tested against. Or maybe the system you depend on behaves slightly differently in production. Whatever the trigger, too-general queries introduce ambiguity, and within ambiguity is where bugs hide.

    A Real-World Failure: Cloudflare

    A dramatic example of this played out recently when Cloudflare experienced its largest outage in six years. Their automated bot detection workflow depended on a SQL query that selected columns from a table without restricting the database or schema context that table belonged to. Because the query wasn’t explicit about where the table lived, it unexpectedly pulled in additional data from a newly created table of the same name in another database. Those extra columns cascaded into downstream systems, pushing internal constraints past their limits and ultimately caused widespread 5xx errors. What initially looked like a massive DDoS attack turned out to be the result of a missing condition in a SQL WHERE clause:

    This query worked, until it didn’t. And importantly, nothing inside the query would have warned engineers that it was too general. That’s the real lesson here: problems caused by lack of explicitness rarely announce themselves upfront. They surface only when the system evolves, often at the worst possible time and at a scale disproportionate to the simplicity of the mistake.

    The Case for Explicitness in Production

    As developers, we can’t foresee every future schema change or integration nuance. But we can control how intentional we are in our production code. Explicitness, whether in SQL databases, API configurations, or data pipelines, provides guardrails that prevent subtle assumptions from turning into major outages. The cost of being specific, while seemingly tedious during development, is small. The cost of being vague is unpredictable. And unpredictability is the one thing production environments never tolerate well.

  • Why You Don’t Need to Hire a Data Scientist

    When I transitioned from research to a Data Scientist role I interviewed with a wide range of organizations: from early-stage start-ups, to established international corporations, to government institutions. Through the process I began to notice a pattern: everyone wanted to fill a “Data Scientist” role, but when I asked about their day-to-day data challenges a different need emerged. Many teams lacked the foundational systems needed for reliable data collection, storage, and accessibility. They were excited about the possibilities of data insights, but their infrastructure still needed to catch up.

    The more conversations I had, the clearer it became that the title “Data Scientist” was often being used as a catch-all for a broader set of data needs.

    The Disconnect Between “Data Science” and Data Reality

    One paper manufacturing company wanted to forecast mechanical failures to plan preventive maintenance. It was an exciting goal, however they had no centralized system for collecting and storing the sensor information that a successful model would rely on.

    A bioinformatics institution hoped to apply machine learning techniques to predict protein-virus binding from lab results. The researchers had valuable datasets, but much of the information was siloed across different investigators and not easily shared or standardized.

    At ION, where I ultimately landed, the situation was different on the surface: they already had data entry and storage systems in place. But once I arrived, I realized those systems had been designed early in the company’s history and never evolved as the ceramic technology matured. Many features had become obsolete, forcing teams back into spreadsheets and manual file management.

    Across these experiences, I began to notice the same pattern. Organizations wanted to “jump into data insights,” but what they actually needed first was someone to wrangle their data infrastructure.

    What I Actually Did as a ‘Data Scientist’

    A large part of my early work at ION had little to do with statistical modeling or advanced analytics. Instead, I found myself helping the company strengthen the foundations needed to make data useful by:

    • Migrating our internally hosted document-share system to SharePoint
    • Building a full-stack platform to move Quality Control from spreadsheets to a database
    • Developing databases and pipelines to centrally collect and store cell testing data from multiple cycler providers

    Once these foundations were in place, it became possible to build tools that are more typically associated with a Data Scientist role. I began designing custom dashboards and BI reports that provided automated, real-time analytics for teams across the company.

    While these foundational projects ended up shaping many aspects of our data culture, I was not classically trained in data engineering, software development, or architecture design. I came from a background in statistics, modeling, and analytics. As a result, many of my early solutions were built with a focus on getting something working quickly rather than perfecting the long-term architecture, which became its own important learning experience.

    The Value of Imperfect, Fast Solutions

    Because I was not deeply trained in engineering best practices, alongside the high-speed needs of a start-up, my natural approach was to “get something working fast” rather than spend weeks or months designing the theoretically perfect system. And honestly, that was exactly what the company needed.

    ION’s technology was evolving quickly. Teams had limited bandwidth and the company was learning in real time. The cost of standing still was higher than the cost of building something that might need to be refined or replaced later. Delivering a working centralized database or an automated reporting system today could save teams hours of effort every week, even if it needed duct tape to hold it together over time.

    Those early, fast solutions were stepping stones. They revealed what worked, exposed pain points, and showed which ideas were worth investing in.

    From Quick-Fixes to a Strategic Roadmap

    All those scrappy implementations gave me something more valuable than technical polish: a deep understanding of our real workflows, bottlenecks, team habits, and failure points. That foundation ultimately allowed me to design a long-term roadmap to modernize ION’s entire tech stack, one that:

    • Leverages existing systems wherever possible to minimize cost and disruption
    • Incorporates modern best practices and data governance
    • Creates scalable pathways for analytics, automation, and future AI applications that support the company’s rapid growth

    This kind of plan did not come from applying advanced machine learning or building sophisticated models. It came from being curious about how things worked, paying attention to what people actually needed, and being willing to learn whatever tools were required along the way.

    In other words, we did not need a traditional “Data Scientist” to move things forward. We needed someone adaptable, practical, and ready to connect technology with real-world use.

    So Before You Hire a Data Scientist…

    Take a step back and ask:

    • Do you consistently and accurately capture the data you need in the first place?
    • Is your data organized, accessible, and usable across teams?
    • Do you have an infrastructure that allows analyses or models to be deployed and maintained?

    If the answer to any of these is no, then a Data Scientist is probably not the role you need to fill right now. Instead, you likely need someone who can move your data environment from chaotic to useful. That might be a Data Engineer, a Full-Stack Developer, a Systems Architect, or simply someone with the curiosity and flexibility to build the foundations your organization is missing. The job title matters far less than matching the role to your actual data needs.

    But Yes, You Can Still Hire a Data Scientist

    A good Data Scientist is naturally curious, forward-thinking, and adaptable. They can absolutely be successful in companies at any stage, as long as expectations are aligned. However, when a company’s data architecture is still developing employers should expect:

    • Early wins in automation, reporting, data wrangling, and workflow optimization
    • A middle stage where scaling requires structural changes
    • A later stage where MLOps, AI, and advanced analytics can finally thrive

    A Data Scientist can have a big impact, but that impact depends on how well their skills align with your current data maturity. Setting the right expectations is the first step toward turning that alignment into real success.