Take advantage of our conference discount and book your room at the AT&T Conference Hotel.

The Data Day Texas 2025 Sessions

All Your Base Are Belong To Us: Adversarial Attack and Defense

Michelle Yi - Women in Data

The increasing deployment of generative AI models in production environments introduces new security challenges, particularly in the realm of adversarial attacks. While visually or textually subtle, these attacks can manipulate generative models, leading to harmful consequences such as medical misdiagnoses from tampered images or the spread of misinformation through compromised chatbots. This talk examines the vulnerabilities of generative models in production settings and explores potential defenses against adversarial attacks. Drawing on insights from attacks against Vision-Language Pre-training (VLP) models, which are a key component in text-to-image and text-to-video models, this talk highlights the importance of understanding cross-modal interactions and leveraging diverse data for crafting robust defenses.

Escape the Data & AI Death Cycle, Enter the Data & AI Product Mindset

Anne-Claire Baschet - Mirakl
Yoann Benoit - Hymaïa

On the horizon, there's a transformation underway where every digital product will encompass Data & AI capabilities.

However, we must recognize that Data and Product teams have distinct cultures and origins. Data teams possess an array of tools and technical expertise, yet they often struggle with quantifying the value they deliver. They frequently miss the mark in addressing the right problems that align with customer needs or in collaborating with Business-Product-Engineering teams.

This is where adopting a Product Mindset becomes paramount. Closing the divide between the Data and Product communities is imperative, as both groups must collaborate on a daily basis to create value for users and support businesses in reaching their goals.
In this talk, you will get insights into : identifying and overcoming the most common traps that Data Teams fall into when delivering Data & AI initiatives
• Crafting impactful Data & AI Products that solve the right problems
• Scaling a Data & AI Product Culture throughout the whole organization and define a Data & AI Product Strategy.

The Outcomes Economy: A Technical Introduction To AI Agentic Systems, Multi-Simulations, & Ontologies

Vin Vashishta - V Squared

Linear has given way to exponential. Digital apps are tools for agents. Data models complex and dynamical systems. The goal is models training models and building their own tools, but nothing is designed to support that today. AI platforms must follow new architectural tenets.
The AI platform roadmap must be designed to accept the realities of where businesses are today; low data maturity and resistance to change. Businesses are in a state of continuous transformation or managed decline. As Sam Altman said, “Stasis is a myth,” which means startups and SMBs have a new competitive advantage.
The speaker will deep dive into AI platforms' three primary architectural components. He’ll explain how they are constructed using real-world case studies from emerging AI platforms. This talk will touch on complex and dynamical systems modeling and where ontologies fit. The talk wraps up with a pragmatic approach to aligning technology with the business and its customers.

The human side of data: Using technical storytelling to drive action

Annie Nelson - GitLab / Annie's Analytics

Join Annie for a session on learning the art of technical storytelling. Drawing from her background in psychology and experience as a data analyst, Annie will share strategies that go beyond just communicating data - how to influence stakeholders from the start and throughout a project’s lifecycle. Whether you're at the kickoff of a project, guiding decisions along the way, or presenting final results, the way you tell the story can have a big impact on its success.
In this session, Annie will explore a practical framework for crafting technical stories that not only explain data but also build trust, influence decision-making, and inspire action at every stage of the process. She will also provide real-world examples of how to tailor your message for both technical teams and business leaders, so you can engage all of your stakeholders effectively. You’ll leave with actionable techniques that help you drive results by tapping into an overlooked tool in data: emotion.

How to Start Investing in Semantics and Knowledge: A Practical Guide

Juan Sequeda - Data.World

What do enterprises lose by not investing in semantics and knowledge? The ability to reuse data effectively due to the lack of context and understanding of what the data means. How is AI going to use data if we don't even understand it? This is why we waste so much time, money, and lack strategic focus.
Many practitioners are already doing critical data and knowledge work, but it’s often overlooked and treated as second-class. In this talk, I will focus on practical knowledge engineering steps to start investing in semantics and knowledge and demonstrate how to elevate this data and knowledge work as a first-class citizen.
We’ll explore four key areas: communication, culture, methodology and technology. The goal is for attendees to leave with concrete steps on how to start investing in semantics and knowledge today, empowering them to be efficient and resilient.

Deployment at scale of an AI system based on custom LLMs : technical challenges and architecture

Arthur Delaitre - Mirakl

Mirakl is transforming seller catalog onboarding through the deployment of a scalable AI system based on custom fine-tuned Large Language Models (LLMs) and state-of-the-art multimodal models. Traditional onboarding processes can take up to two months; the new system reduces this to mere hours, efficiently handling millions of products.
This presentation will delve into the technical challenges and architectural solutions involved in deploying custom LLMs at scale. Key topics include:
• Infrastructure Deployment: Building scalable environments for LLM inference.
• Model Fine-Tuning: Customizing LLMs and quality improvements through hallucination reduction and consistency increase.
• Micro-Service Architecture: Orchestrating between model services and hosting for efficient operation. Synergies of systems containing LLMs and other ML models.
• Layered Approach: Selecting optimal results while minimizing computational costs.
Arthur will explore how these technologies are integrated into a production-ready system, discussing the strategies used to overcome scaling challenges and ensure high performance. Attendees will gain insights into deploying advanced AI systems in real-world environments, optimizing large-scale inference, and setting new industry standards in marketplace technology.

Automating Financial Reconciliation with Linear Programming and Optimization

(90 minute deep dive session)
Bethany Lyons - Assured Insights

Some of the knarliest data quality problems arise from the absence of relationships in data that exist in the world. Suppose you've raised multiple payment requests for $1000, $2000 and $3000. Then $6000 hits your bank account. Those three invoices should be linked to the 6k payment amount, but many systems fail to capture those links. As a result, you have to infer the relationships after the fact through a series of computational math techniques. This session will take you through real world examples and challenges of such a solution, with broad applications across finance and financial services.

Unleashing the Power of Multimodal GraphRAG: Integrating Image Features for Deeper Insights

David Hughes

GraphRAG has proven to be a powerful tool across various use cases, enhancing retrieval accuracy, language model integration, and delivering deeper insights to users. However, a critical dimension remains underexplored: the integration of visual data. How can images—so rich in contextual and relational information—be seamlessly incorporated to further augment the power of GraphRAG?
In this presentation, we introduce Multimodal GraphRAG, an innovative framework that brings image data to the forefront of graph-based reasoning and retrieval. By extracting meaningful objects and features from images, and linking them with text-based semantics, Multimodal GraphRAG unlocks new pathways for surfacing insights. From images embedded in documents to collections of related visuals, we’ll demonstrate how this approach enables more comprehensive understanding, amplifying both the depth and accuracy of insights.

Optimisation Platforms for Energy Trading

Adam Sroka - Hypercube

As the energy sector transitions to new technologies and hardware, the data requirements are undergoing significant changes. At the same time, the markets in which energy systems operate are also evolving - giving traders and energy teams a vastly more complex set of options against which they need to make decisions.
The move to real-time data for BESS system operation and the addition of multiple markets makes optimisation of revenue for storage assets intractable for human operation alone.
In this talk, Adam Sroka will walk through one solution deployed at a leading BESS trading company in the UK that aligned probabilistic forecasting and stochastic methodologies with a linear optimisation engine to determine the best markets, prices, and trades for any given portfolio of mixed energy and storage assets.
Adam will walk through an architecture diagram for a system that integrates real-time, near real-time, and slow-moving data with AI-driven forecasts and the complexities of optimisation management.

The Future of Data Education and Publishing in the Era of AI

Jess Haberman - Anaconda
Michelle Yi - Women in Data
Hala Nelson - James Madison University

With easier access to expert knowledge, we are in the midst of a significant shift in the technical education and publishing landscapes. Do these advancements propel us toward educational bliss or do they pose unprecedented threats to industry and academia? Join us as we see to unravel the future of data and tech education.
The surge of generative AI content sparks a range of debates: Does it herald a new era of learning or threaten academic integrity? Will AI augment or overshadow human-generated educational materials? What implications does the proliferation of AI-generated content hold for authors and the discoverability of their work? Does democratized access to generative AI writing tools make our writing better and more efficient, or simply more generic? We will delve into the ramifications of AI tools on writing, teaching, and student learning, exploring the opportunities they present for knowledge dissemination and the concerns they raise with regard to content quality and correctness. Join us for a discussion on the future of data education and its transformative impact on the realms of technology, academia, and publishing.

Our esteemed panelists bring education and publishing perspectives:
Hala Nelson: Associate Professor of Mathematics at James Madison University and author of Essential Math for AI.
Michelle Yi: Board Member at Women in Data and advocate for STEM education among underrepresented minorities.
Jess Haberman (panelist and moderator): Content and education leader at Anaconda, leveraging 14 years of publishing experience, including as an acquisitions editor at O’Reilly Media.