Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行,中国的小伙伴是可以在线收听的,一共为期四天,第一天是培训,后面几天才是正式会议。本次会议有超过200个议题,演讲嘉宾包括业界、研究和学术界的专家,本次会议主要分为六大块:
- 数据分析, BI 以及可视化:了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。
- 数据工程:从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops,深入了解最新的数据工程知识。
- Data Lakes, Data Warehouses and Data Lakehouses:了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践;
- 数据科学, 机器学习以及 MLOps:了解关于生产数据科学和机器学习管道的技术和最佳实践。
- 数据安全和治理:
- 学术研究:致力于学术和先进的工业研究领域,包括大规模调度程序,图表,数据分析和机器学习系统。
会议的全部日程请参见:https://databricks.com/dataaisummit/agenda
本次会议的超清视频已经在前几天分享给大家了,需要的同学可以到 《Data + AI Summit 2022 超清视频下载》获取下载链接。本文主要收集了本次会议的 PPT,需要的同学可以获取。
超清 PPT 下载途径
目前可以获取到的 PPT 主要有 170 个左右,关注微信公众号 过往记忆大数据 或者 Java与大数据架构
- 回复 10189 获取 Data + AI Summit 2022 超清 PPT;
- 回复 10187 获取 Data + AI Summit 2022 超清 视频。
推荐观看的议题
由于 Data + AI Summit 2022 会议的议题比较多,不一定都感兴趣,所以这块我给大家整理出十几个比较干的议题,推荐大家观看:
- Apache Spark SQL Aggregate Improvement at Meta (Facebook)
- Recent Parquet Improvements in Apache Spark
- Spark Data Source V2 Performance Improvement: Aggregate Push Down
- Deep Dive into the New Features of Apache Spark 3.2 and 3.3
- Managing Straggler Executors at Apache Spark 3.3
- Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
- PySpark in Apache Spark 3.3 and Beyond
- Delta Lake 2.0 Overview
- Improving Interactive Querying Experience on Spark SQL
- Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest
- Radical Speed on the Lakehouses: Photon under the hood
- Deep-Dive into Delta Lake
- Presto 101: An Introduction to Open Source Presto
- Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance
- Advanced Migrations: From Hive to SparkSQL
- Presto On Spark: A Unified SQL Experience
可下载 PPT 的议题
本次可下载视频的议题共 170 个。
- A Modern Approach to Big Data for Finance
- A Practitioner's Guide to Unity Catalog A Technical Deep Dive
- AI Fueled Forecasting The Next Generation of Financial Planning
- AI powered Assortment Planning Solution
- ALaSpark Gousto Recipe for Building Scalable PySpark Pipelines
- Accelerating the Pace of Autism Diagnosis with Machine Learning Models
- Achieve Machine Learning Hyper Productivity with Transformers and Hugging Face
- Administrator Best Practices and Tips for Future Proofing your Databricks Account
- Advanced Migrations From Hive to SparkSQL
- Adversarial Drifts, Model Monitoring, and Feedback Loops Building Human in the Loop Machine Learning Systems for Content Moderation
- Agile Data Engineering Reliability and Continuous Delivery at Scale
- Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse
- An Advanced S3 Connector for Spark to Hunt for Cyber Attacks
- Apache Arrow Flight SQL High Performance, Simplicity, and Interoperability for Data Transfers
- Apache Spark SQL Aggregate Improvement at Meta (Facebook)
- Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
- Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance
- Applied Predictive Maintenance in Aviation Without Sensor Data
- Auto Encoder Decoder Based Anomaly Detection with the Lakehouse Paradigm
- Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
- Automating Model Lifecycle Orchestration with Jenkins
- Automating Business Decisions Using Event Streams
- Backfill Streaming Data Pipelines in Kappa Architecture
- Best Practices of Maintaining High Quality Data
- Big Data in the Age of Moneyball
- Build an Enterprise Lakehouse for Free with Trino and Delta Lake
- Building Enterprise Scale Data and Analytics Platforms at Amgen
- Building Metadata and Lineage Driven Pipelines on Kubernetes
- Building Production Ready Recommender Systems with Feature Stores
- Building a Data Science as a Service platform in Azure with Databricks
- Building a Lakehouse for Data Science at DoorDash
- Building an Analytics Lakehouse at Grab
- Building and Scaling Machine Learning Based Products in the World's Largest Brewery
- Building Spatial Applications with Apache Spark and CARTO
- Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security
- Case Study in Rearchitecting an On Premises Pipeline in the Cloud
- Challenges in Time Series Forecasting
- Chaos Engineering in the World of Large Scale Complex Data Flow
- Cloud Native Geospatial Analytics at JLL
- Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks
- Computational Data Governance at Scale
- Connecting the Dots with DataHub Lakehouse and Beyond
- Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines
- Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness
- Customer centric Innovation to Scale Data AI Everywhere
- Cutting the Edge in Fighting Cybercrime Reverse Engineering a Search Language to Cross Compile it to PySpark
- DBA Perspective Optimizing Performance Table by Table
- Data Boards A Collaborative and Interactive Space for Data Science
- Data Centric Principles for AI Engineering
- Data Lakehouse and Data Mesh Two Sides of the Same Coin
- DataFusion and Arrow Supercharge Your Data Analytical Tool with a Rusty Query Engine
- Databricks Meets Power BI
- Databricks and Enterprise Observability with Overwatch
- Deep Dive into Delta Lake
- Deep Dive into the New Features of Apache Spark
- Delta Lake Overview
- Delta Sharing A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse
- Democratizing Metrics at Airbnb
- Designing Better MLOps Systems
- Destination Lakehouse All Your Data Analytics and AI on One Platform
- Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach
- Discover Data Lakehouse With End to End Lineage
- Disrupting the Prescription Drug Market with AI and Data
- Distributed Machine Learning at Lyft
- Doubling the Capacity of the Data Platform Without Doubling the Cost
- Elixir The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox
- Embedding Privacy by Design Into Data Infrastructure Through Open Source Extensible Tooling
- Enable Production ML with Databricks Feature Store
- Enabling BI in a Lakehouse Environment
- Enabling Learning on Confidential Data
- Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification
- Evolution of Data Architectures and How to Build a Lakehouse
- Fugue Tune Distributed Hybrid Hyperparameter Tuning
- FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning
- GIS Pipeline Acceleration with Apache Sedona
- Git for Data Lakes How lakeFS Scales Data Versioning to Billions of Objects
- Hassle Free Data Ingestion into the Lakehouse
- How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks Lakehouse
- How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities
- How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins
- How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances
- How To Use Databricks SQL for Analytics on Your Lakehouse
- How socat and UNIX Pipes Can Help Data Integration
- How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse
- How to Build a Complete Security and Governance Solution Using Unity Catalog
- How to Implement a Semantic Layer for Your Lakehouse
- Implementing Data Governance 3.0 for the Lakehouse Era Community Led and Bottom Up
- Implementing a Framework for Data Security and Policy at a Large Public Sector Agency
- Implementing an End to End Demand Forecasting Solution Through Databricks and MLflow
- Improving Apache Spark Structured Streaming Application Processing Time
- Improving Interactive Querying Experience on Spark SQL
- Improving patient care with Databricks
- Ingesting data into Lakehouse with COPY INTO
- Integrating Apache Superset into a B2B Platform Why and How
- Introducing Zipline An Open Source Feature Engineering Platform
- Learn to Efficiently Test ETL Pipelines
- Lessons Learned from Deidentifying 700 Million Patient Notes
- Low Code Machine Learning on Databricks with AutoML
- MLOps at DoorDash
- MLflow Pipelines Accelerating MLOps from Development to Production
- Mapping Data Quality Concerns to Data Lake Zones
- Meshing About with Databricks
- Migrate and Modernize your Data Platform with Confluent and Databricks
- Migrating Complex SAS Processes to Databricks Case Study
- Monitoring and Quality Assurance of Complex ML Deployments via Assertions
- Mosaic A Framework for Geospatial Analytics at Scale
- Multimodal Deep Learning Applied to E commerce Big Data
- Near Real Time Analytics with Event Streaming, Live Tables, and Delta Sharing
- Obfuscating Sensitive Information from Spark UI and Logs
- Open Source Powers the Modern Data Stack
- Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion Row Datasets with SQL Data Warehouses
- Optimizing Speed and Scale of User Facing Analytics Using Apache Kafka and Pinot
- Polars Blazingly Fast DataFrames in Rust and Python
- Power to the SQL People Python UDFs in DBSQL
- Powering Up the Business with a Lakehouse
- Practical Data Governance in a Large Scale Databricks Environment
- Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning
- Presto On Spark A Unified SQL Experience
- Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark
- Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow
- Protecting Personally Identifiable Information (PII) PHI Data in Data Lake via Column Level Encryption
- PySpark in Apache Spark 3.3 and Beyond
- Radical Speed on the Lakehouse Photon Under the Hood
- Real Time Search and Recommendation at Scale Using Embeddings and Hopsworks
- Real Time Cost Reduction Monitoring and Alerting
- Realize the Promise of Streaming with the Databricks Lakehouse Platform
- Recent Parquet Improvements in Apache Spark
- Rethinking Orchestration as Reconciliation Software Defined Assets in Dagster
- Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core
- Scalable XGBoost on GPU Clusters
- Scaling AI Workloads with the Ray Ecosystem
- Scaling Your Workloads with Databricks Serverless
- Scaling Deep Learning on Databricks
- Scaling ML at CashApp with Tecton
- Scaling Privacy Practical Architectures and Experiences
- Security Best Practices for Lakehouse
- Self Serve Automated and Robust CDC pipeline using AWS DMS DynamoDB Streams and Databricks Delta
- Serverless Kafka and Apache Spark in a Multi Cloud Data Lakehouse Architecture
- Serving Near Real Time Features at Scale
- Setting up On Shelf Availability Alerts at Scale with Databricks and Azure
- Simplify Global DataOps and MLOps Using Oktas FIG Automation Library
- Simplifying Migrations to Lakehouse—the Databricks Way
- Smart Manufacturing Real time Process Optimization with Databricks
- So Fresh and So Clean Learn How to Build Real Time Warehouses on Lakehouse
- Sound Data Engineering in Rust From Bits to DataFrames
- Spark Data Source V2 Performance Improvement Aggregate Push Down
- Spark Inception Exploiting the Apache Spark REPL to Build Streaming Notebooks
- Spline Central Data Lineage Tracking Not Only For Spark
- State of the Art Natural Language Processing with Apache Spark NLP
- Streaming ML Enrichment Framework Using Advanced Delta Table Features
- Survey of Production ML Tech Stacks
- Technical and Tactical Football Analysis Through Data
- The Databricks Notebook Front Door of the Lakehouse
- The Modern Metadata Platform What Why and How
- The Road to a Robust Data Lake 0
- The Semantics of Biology Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing on Apache Spark teblog.pdf
- Time Series Forecasting with PyCaret
- Tools for Assisted Apache Spark Version Migrations, From 2.1 to 3.2+
- Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges
- Turning Big Biology Data into Insights on Disease The Power of Circulating Biomarkers
- Turning Fan Data Into an Asset
- UIMeta A 10X Faster Cloud Native Spark History Server
- Unifying Data Science and Business
- Vision AI Animal Health Industry Use Cases Using Databricks on Azure
- What to Do When Your Job Goes OOM in the Night Flowcharts
- X FIPE eXtended Feature Impact for Prediction Explanation
- You Have BI Now What Activate Your Data
- dbt Machine Learning What Makes a Great Baton Pass
- dbt and Python Better Together
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Data + AI Summit 2022 PPT 下载】(https://www.iteblog.com/archives/10189.html)