Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行,中国的小伙伴是可以在线收听的,一共为期四天,第一天是培训,后面几天才是正式会议。本次会议有超过200个议题,演讲嘉宾包括业界、研究和学术界的专家,本次会议主要分为六大块:
- 数据分析, BI 以及可视化:了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。
- 数据工程:从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops,深入了解最新的数据工程知识。
- Data Lakes, Data Warehouses and Data Lakehouses:了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践;
- 数据科学, 机器学习以及 MLOps:了解关于生产数据科学和机器学习管道的技术和最佳实践。
- 数据安全和治理:
- 学术研究:致力于学术和先进的工业研究领域,包括大规模调度程序,图表,数据分析和机器学习系统。
会议的全部日程请参见:https://databricks.com/dataaisummit/agenda
本次会议的第一天 KeyNote 宣布了几件重要的事情:Apache Spark 后续发展、下一代 Structured Streaming 解决方案、Delta Lake 的功能全部开源。除了第一天的 KeyNote,下面几个议题也推荐大家看看:
- Apache Spark SQL Aggregate Improvement at Meta (Facebook)
- Recent Parquet Improvements in Apache Spark
- Spark Data Source V2 Performance Improvement: Aggregate Push Down
- Deep Dive into the New Features of Apache Spark 3.2 and 3.3
- Managing Straggler Executors at Apache Spark 3.3
- Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
- PySpark in Apache Spark 3.3 and Beyond
- Delta Lake 2.0 Overview
- Improving Interactive Querying Experience on Spark SQL
- Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest
- Radical Speed on the Lakehouses: Photon under the hood
超清视频下载途径
考虑到大家可能对不同的主题感兴趣,这里给大家整理了所有可以下载的视频,全部是超清,大家可以根据自己的兴趣去下载观看。另外,会议的 PPT 当前还不可以下载,需要 PPT 的同学可以继续关注本公众号,获取相关消息。
关注微信公众号 过往记忆大数据 或者 Java与大数据架构 并回复 10187 获取 Data + AI Summit 2022 超清视频。
可下载视频的议题
本次可下载视频的议题共 197 个。
- A Low-Code Approach to 10x Data Engineering
- A Modern Approach to Big Data for Finance
- A Practitioner's Guide to Unity Catalog—A Technical Deep Dive
- Accelerating Hybrid Data Mesh Implementation
- Accidentally Building a Petabyte-Scale Cybersecurity Data Mesh in Azure With Delta Lake at HSBC
- Administrator Best Practices and Tips for Future-proofing your Databricks Account
- Advanced Migrations From Hive to SparkSQL
- Adversarial Drifts Model Monitoring and Feedback Loops Building Human-in-the-Loop Machine Learning Systems for Content Moderation
- An Advanced S3 Connector for Spark to Hunt for Cyber Attacks
- Analyzing Population Health using Healthcare Claims
- Apache Arrow Flight SQL: High Performance, Simplicity, and Interoperability for Data Transfers
- Apache Spark SQL Aggregate Improvement at Meta Facebook
- Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
- Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
- Automating Model Lifecycle Orchestration with Jenkins
- Backfill Streaming Data Pipelines in Kappa Architecture
- Batches Streams and Everything in between Unifying Batch and Stream Storage with Apache Pulsar and Lakehouse Architectures
- Beyond Daily Batch Processing Operational Trade-Offs of Microbatch Incremental and Real-Time Processing for Your ETLs (and Your Team's Sanity)
- Beyond Monitoring The Rise of Data Observability
- Build an Enterprise Lakehouse for Free with Trino and Delta Lake
- Building Enterprise Scale Data and Analytics Platforms at Amgen
- Building Metadata and Lineage Driven Pipelines on Kubernetes
- Building Production-Ready Recommender Systems with Feature Stores
- Building Scalable & Advanced AI based Language Solutions for R&D using Databricks
- Building Spatial Applications with Apache Spark and CARTO
- Building a Lakehouse for Data Science at DoorDash
- Building a Lakehouse on AWS for Less with AWS Graviton and Photon
- Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security
- Building and Scaling Machine Learning-Based Products in the World's Largest Brewery
- Chaos Engineering in the World of Large-Scale Complex Data Flow
- Cloud Native Geospatial Analytics at JLL
- Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks
- Complete Data Security and Governance Powered by Unity Catalog and Immuta
- Connecting the Dots with DataHub Lakehouse and Beyond
- Constraints, Democratization, and the Modern Data Stack - Building a Data Platform At Red Ventures with Fivetran and Databricks
- Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines
- Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness
- Cutting the Edge in Fighting Cybercrime Reverse-Engineering a Search Language to Cross-Compile it to PySpark
- DELETE UPDATE MERGE Operations in Data Source V2
- Data Lakehouse and Data Mesh—Two Sides of the Same Coin
- Data Mesh Implementation Patterns
- Data Warehousing on the Lakehouse
- DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine
- Databricks Lakehouse Overview
- Databricks SQL Under the Hood: What's New with Live Demos
- Day 1 Afternoon Keynote
- Day 2 Afternoon Keynote
- Day 2 Opening Keynote
- Deep Dive How to Build Your Modern Data Stack on Databricks to Solve Modern Problems
- Deep Dive into the New Features of Apache Spark 3.2 and 3.3
- Deliver Faster Decision Intelligence From Your Lakehouse
- Delta Lake, the Foundation of Your Lakehouse
- Delta Live Tables Modern software engineering and management for ETL
- Delta Sharing - A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse
- Delta Sharing for Healthcare and Life Sciences
- Democratizing Metrics at Airbnb
- Designing Better MLOps Systems
- Destination Lakehouse All Your Data Analytics and AI on One Platform
- Distributed Machine Learning at Lyft
- Dive Deeper into Data Engineering on Databricks
- Doubling the Capacity of the Data Platform Without Doubling the Cost
- Driving Real-Time Data Capture and Transformation in Delta Lake with Change Data Capture
- Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads
- Eliminating AI Risk—One Model Failure at a Time
- Emerging Data Architectures & Approaches for Real-Time AI using Redis
- Enable Production ML with Databricks Feature Store
- Enabling Advanced Analytics at The Department of State using Databricks
- Enabling BI in a Lakehouse Environment How Spark and Delta Can Help With Automating a DWH Development
- Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code
- Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification
- Entity Resolution
- Evolution of Data Architectures and How to Build a Lakehouse
- Financial Services Industry Forum: The Future of Financial Services is Open with Data and AI at Its Core
- Fugue Tune Distributed Hybrid Hyperparameter Tuning
- FugueSQL—The Enhanced SQL Interface for Pandas and Spark DataFrames
- FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning
- Gamer User Toxicity
- Gazelle-Jni: A Middle Layer to Offload Spark SQL to Native Engines for Execution Acceleration
- Government Industry Forum Lunch and Program
- Hassle-Free Data Ingestion into the Lakehouse
- Healthcare Data Interoperability
- How AARP Services Inc. automated SAS transformation to Databricks using LeapLogic—A cloud accelerator for transformation of legacy analytics ETL DW & Hadoop
- How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks with Two Different Approaches using Photon and RAPIDS Accelerator for Apache Spark
- How Databricks is driving disruptive digital transformation in the airline industry
- How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities
- How McAfee Leverages Databricks on AWS at Scale
- How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins
- How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances
- How To Use Databricks SQL for Analytics on Your Lakehouse
- How to Implement a Semantic Layer for Your Lakehouse
- How unsupervised machine learning can scale data quality monitoring in Databricks
- Immuta - Unlocking sensitive use cases with automated data access
- Implementing a Framework for Data Security and Policy at a Large Public Sector Agency
- Implementing an End-to-End Demand Forecasting Solution Through Databricks and MLflow
- Improving Apache Spark Structured Streaming Application Processing Time by Configurations Code Optimizations and Custom Data Source
- Improving Interactive Querying Experience on Spark SQL
- Introducing Zipline An Open Source Feature Engineering Platform
- Introduction to Flux and OSS Replication
- Lakehouse with Delta Lake Deep Dive
- Laying the Foundation for Claims Automation
- Learn to Efficiently Test ETL Pipelines
- Lessons Learned from Deidentifying 700 Million Patient Notes
- Leveraging ML-Powered Analytics for Rapid Insights and Action a demonstration
- Live Analytics: The next user engagement frontier
- Low-Code Machine Learning on Databricks with AutoML
- ML on the Lakehouse Bringing Data and ML Together to Accelerate AI Use Cases
- MLOps at DoorDash
- MLflow Pipelines Accelerating MLOps from Development to Production
- Managing Straggler Executors at Apache Spark 3.3
- Meetup Women in Data and AI
- Meshing About with Databricks
- Migrate Your Existing DAGs to Databricks Workflows
- Migrate and Modernize your Data Platform with Confluent and Databricks
- Migrating Complex SAS Processes to Databricks - Case Study
- Migrating SAS to a Lakehouse on Databricks and S3
- Monitoring and Quality Assurance of Complex ML Deployments via Assertions
- More Context Less Chaos How Atlan and Unity Catalog Power Column-Level Lineage and Active Metadata
- Mosaic: A Framework for Geospatial Analytics at Scale
- Moving from Apache Spark 2 to Apache Spark 3 Spark Version Upgrade at Scale in Pinterest
- Multi-Touch Attribution
- Multimodal Deep Learning Applied to E-commerce Big Data
- Near Real-Time Analytics with Event Streaming Live Tables and Delta Sharing
- Nixtla: Deep Learning for Time Series Forecasting
- Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion-Row Datasets with SQL Data Warehouses
- Operational Analytics: Expanding the Reach of Data in the Lakehouse Era
- Optimizing Speed and Scale of User-Facing Analytics Using Apache Kafka and Pinot
- Orchestration Made Easy with Databricks Workflows
- OvalEdge End-To-End Data Governance
- Patient Cohort Building with NLP and Knowledge Graphs
- Powering Up the Business with a Lakehouse
- Practical Data Governance in a Large Scale Databricks Environment
- Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning
- Predicting and Preventing Machine Downtime with AI and Expert Alerts
- Propensity Scoring Demo
- Protecting Personally Identifiable Information (PII)/PHI Data in Data Lake via Column Level Encryption
- Pushing the limits of scale and performance for enterprise-wide analytics: A fire-side chat with Akamai
- PySpark in Apache Spark 3.3 and Beyond
- Radical Speed on the Lakehouse Photon Under the Hood
- Real Time Bidding
- Real Time Retail Demo
- Real World Evidence and Propensity Score Matching
- Real-Time Search and Recommendation at Scale Using Embeddings and Hopsworks
- Real-time Risk Management with Confluent & Databricks
- Realize the Promise of Streaming with the Databricks Lakehouse Platform
- Recent Parquet Improvements in Apache Spark
- Regulatory Reporting: Automatically translate enterprise data models into efficient data pipelines
- Retail Industry Forum
- Rethinking Orchestration as Reconciliation Software-Defined Assets in Dagster
- Running a Low Cost Versatile Data Management Ecosystem with Apache Spark at Core
- SAS Migration
- Scaling AI Workloads with the Ray Ecosystem
- Scaling Deep Learning on Databricks
- Scaling ML at CashApp with Tecton
- Scaling Salesforce In-Memory Streaming Analytics Platform for Trillion Events Per Day
- Scaling Your Workloads with Databricks Serverless
- Search and Aggregations Made Easy with OpenSearch and NodeJS
- Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture
- Serving Near Real-Time Features at Scale
- Simplifying Migrations to Lakehouse—the Databricks Way
- Sink Framework Evolution in Apache Flink
- Smart Manufacturing Real-time Process Optimization with Databricks
- So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse
- Sound Data Engineering in Rust—From Bits to DataFrames
- Spark Inception: Exploiting the Apache Spark REPL to Build Streaming Notebooks
- Stadium Analytics
- Streaming Data into Delta Lake with Rust and Kafka
- Streaming ML Enrichment Framework Using Advanced Delta Table Features
- Supercharge your SaaS applications with a modern cloud-native database
- Survey of Production ML Tech Stacks
- Tackling Challenges of Distributed Deep Learning with Open Source Solutions
- Take Databricks Lakehouse to the Max with Informatica
- Technical and Tactical Football Analysis Through Data
- The Databricks Notebook Front Door of the Lakehouse
- The Future is Open - a Look at Google Cloud’s Open Data Ecosystem
- The Future of Data - What’s Next with Google Cloud
- The Road to a Robust Data Lake Utilizing Delta Lake and Databricks to Map 150 Million Miles of Roads a Month
- Tools for Assisted Apache Spark Version Migrations From 2.1 to 3.2+
- Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges
- Tredence On Shelf Availability
- Turbocharge your AI/ML Databricks workflows with Precisely
- Turning Fan Data Into an Asset
- Unifying Data Science and Business Artificial Intelligence Augmentation and Integration into Production Business Applications
- What to Do When Your Job Goes OOM in the Night Flowcharts
- Why a Data Lakehouse is Critical During the Manufacturing Apocalypse
- You Have BI. Now What Activate Your Data
- Your fastest path to Lakehouse and beyond
- dbt + Machine Learning What Makes a Great Baton Pass
- dbt and Databricks: Analytics Engineering on the Lakehouse
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Data + AI Summit 2022 超清视频下载】(https://www.iteblog.com/archives/10187.html)