欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

Data + AI Summit 2022 PPT 下载

Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行,中国的小伙伴是可以在线收听的,一共为期四天,第一天是培训,后面几天才是正式会议。本次会议有超过200个议题,演讲嘉宾包括业界、研究和学术界的专家,本次会议主要分为六大块:

  • 数据分析, BI 以及可视化:了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。
  • 数据工程:从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops,深入了解最新的数据工程知识。
  • Data Lakes, Data Warehouses and Data Lakehouses:了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践;
  • 数据科学, 机器学习以及 MLOps:了解关于生产数据科学和机器学习管道的技术和最佳实践。
  • 数据安全和治理:
  • 学术研究:致力于学术和先进的工业研究领域,包括大规模调度程序,图表,数据分析和机器学习系统。

会议的全部日程请参见:https://databricks.com/dataaisummit/agenda

Data + AI Summit 2022” class=
如果想及时了解Spark、Hadoop或者HBase相关的文章,欢迎关注微信公众号:过往记忆大数据

本次会议的超清视频已经在前几天分享给大家了,需要的同学可以到 《Data + AI Summit 2022 超清视频下载》获取下载链接。本文主要收集了本次会议的 PPT,需要的同学可以获取。

超清 PPT 下载途径

目前可以获取到的 PPT 主要有 170 个左右,关注微信公众号 过往记忆大数据 或者 Java与大数据架构

推荐观看的议题

由于 Data + AI Summit 2022 会议的议题比较多,不一定都感兴趣,所以这块我给大家整理出十几个比较干的议题,推荐大家观看:

  • Apache Spark SQL Aggregate Improvement at Meta (Facebook)
  • Recent Parquet Improvements in Apache Spark
  • Spark Data Source V2 Performance Improvement: Aggregate Push Down
  • Deep Dive into the New Features of Apache Spark 3.2 and 3.3
  • Managing Straggler Executors at Apache Spark 3.3
  • Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
  • PySpark in Apache Spark 3.3 and Beyond
  • Delta Lake 2.0 Overview
  • Improving Interactive Querying Experience on Spark SQL
  • Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest
  • Radical Speed on the Lakehouses: Photon under the hood
  • Deep-Dive into Delta Lake
  • Presto 101: An Introduction to Open Source Presto
  • Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance
  • Advanced Migrations: From Hive to SparkSQL
  • Presto On Spark: A Unified SQL Experience

可下载 PPT 的议题

本次可下载视频的议题共 170 个。

  • A Modern Approach to Big Data for Finance
  • A Practitioner's Guide to Unity Catalog A Technical Deep Dive
  • AI Fueled Forecasting The Next Generation of Financial Planning
  • AI powered Assortment Planning Solution
  • ALaSpark Gousto Recipe for Building Scalable PySpark Pipelines
  • Accelerating the Pace of Autism Diagnosis with Machine Learning Models
  • Achieve Machine Learning Hyper Productivity with Transformers and Hugging Face
  • Administrator Best Practices and Tips for Future Proofing your Databricks Account
  • Advanced Migrations From Hive to SparkSQL
  • Adversarial Drifts, Model Monitoring, and Feedback Loops Building Human in the Loop Machine Learning Systems for Content Moderation
  • Agile Data Engineering Reliability and Continuous Delivery at Scale
  • Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse
  • An Advanced S3 Connector for Spark to Hunt for Cyber Attacks
  • Apache Arrow Flight SQL High Performance, Simplicity, and Interoperability for Data Transfers
  • Apache Spark SQL Aggregate Improvement at Meta (Facebook)
  • Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
  • Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance
  • Applied Predictive Maintenance in Aviation Without Sensor Data
  • Auto Encoder Decoder Based Anomaly Detection with the Lakehouse Paradigm
  • Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
  • Automating Model Lifecycle Orchestration with Jenkins
  • Automating Business Decisions Using Event Streams
  • Backfill Streaming Data Pipelines in Kappa Architecture
  • Best Practices of Maintaining High Quality Data
  • Big Data in the Age of Moneyball
  • Build an Enterprise Lakehouse for Free with Trino and Delta Lake
  • Building Enterprise Scale Data and Analytics Platforms at Amgen
  • Building Metadata and Lineage Driven Pipelines on Kubernetes
  • Building Production Ready Recommender Systems with Feature Stores
  • Building a Data Science as a Service platform in Azure with Databricks
  • Building a Lakehouse for Data Science at DoorDash
  • Building an Analytics Lakehouse at Grab
  • Building and Scaling Machine Learning Based Products in the World's Largest Brewery
  • Building Spatial Applications with Apache Spark and CARTO
  • Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security
  • Case Study in Rearchitecting an On Premises Pipeline in the Cloud
  • Challenges in Time Series Forecasting
  • Chaos Engineering in the World of Large Scale Complex Data Flow
  • Cloud Native Geospatial Analytics at JLL
  • Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks
  • Computational Data Governance at Scale
  • Connecting the Dots with DataHub Lakehouse and Beyond
  • Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines
  • Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness
  • Customer centric Innovation to Scale Data AI Everywhere
  • Cutting the Edge in Fighting Cybercrime Reverse Engineering a Search Language to Cross Compile it to PySpark
  • DBA Perspective Optimizing Performance Table by Table
  • Data Boards A Collaborative and Interactive Space for Data Science
  • Data Centric Principles for AI Engineering
  • Data Lakehouse and Data Mesh Two Sides of the Same Coin
  • DataFusion and Arrow Supercharge Your Data Analytical Tool with a Rusty Query Engine
  • Databricks Meets Power BI
  • Databricks and Enterprise Observability with Overwatch
  • Deep Dive into Delta Lake
  • Deep Dive into the New Features of Apache Spark
  • Delta Lake Overview
  • Delta Sharing A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse
  • Democratizing Metrics at Airbnb
  • Designing Better MLOps Systems
  • Destination Lakehouse All Your Data Analytics and AI on One Platform
  • Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach
  • Discover Data Lakehouse With End to End Lineage
  • Disrupting the Prescription Drug Market with AI and Data
  • Distributed Machine Learning at Lyft
  • Doubling the Capacity of the Data Platform Without Doubling the Cost
  • Elixir The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox
  • Embedding Privacy by Design Into Data Infrastructure Through Open Source Extensible Tooling
  • Enable Production ML with Databricks Feature Store
  • Enabling BI in a Lakehouse Environment
  • Enabling Learning on Confidential Data
  • Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification
  • Evolution of Data Architectures and How to Build a Lakehouse
  • Fugue Tune Distributed Hybrid Hyperparameter Tuning
  • FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning
  • GIS Pipeline Acceleration with Apache Sedona
  • Git for Data Lakes How lakeFS Scales Data Versioning to Billions of Objects
  • Hassle Free Data Ingestion into the Lakehouse
  • How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks Lakehouse
  • How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities
  • How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins
  • How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances
  • How To Use Databricks SQL for Analytics on Your Lakehouse
  • How socat and UNIX Pipes Can Help Data Integration
  • How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse
  • How to Build a Complete Security and Governance Solution Using Unity Catalog
  • How to Implement a Semantic Layer for Your Lakehouse
  • Implementing Data Governance 3.0 for the Lakehouse Era Community Led and Bottom Up
  • Implementing a Framework for Data Security and Policy at a Large Public Sector Agency
  • Implementing an End to End Demand Forecasting Solution Through Databricks and MLflow
  • Improving Apache Spark Structured Streaming Application Processing Time
  • Improving Interactive Querying Experience on Spark SQL
  • Improving patient care with Databricks
  • Ingesting data into Lakehouse with COPY INTO
  • Integrating Apache Superset into a B2B Platform Why and How
  • Introducing Zipline An Open Source Feature Engineering Platform
  • Learn to Efficiently Test ETL Pipelines
  • Lessons Learned from Deidentifying 700 Million Patient Notes
  • Low Code Machine Learning on Databricks with AutoML
  • MLOps at DoorDash
  • MLflow Pipelines Accelerating MLOps from Development to Production
  • Mapping Data Quality Concerns to Data Lake Zones
  • Meshing About with Databricks
  • Migrate and Modernize your Data Platform with Confluent and Databricks
  • Migrating Complex SAS Processes to Databricks Case Study
  • Monitoring and Quality Assurance of Complex ML Deployments via Assertions
  • Mosaic A Framework for Geospatial Analytics at Scale
  • Multimodal Deep Learning Applied to E commerce Big Data
  • Near Real Time Analytics with Event Streaming, Live Tables, and Delta Sharing
  • Obfuscating Sensitive Information from Spark UI and Logs
  • Open Source Powers the Modern Data Stack
  • Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion Row Datasets with SQL Data Warehouses
  • Optimizing Speed and Scale of User Facing Analytics Using Apache Kafka and Pinot
  • Polars Blazingly Fast DataFrames in Rust and Python
  • Power to the SQL People Python UDFs in DBSQL
  • Powering Up the Business with a Lakehouse
  • Practical Data Governance in a Large Scale Databricks Environment
  • Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning
  • Presto On Spark A Unified SQL Experience
  • Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark
  • Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow
  • Protecting Personally Identifiable Information (PII) PHI Data in Data Lake via Column Level Encryption
  • PySpark in Apache Spark 3.3 and Beyond
  • Radical Speed on the Lakehouse Photon Under the Hood
  • Real Time Search and Recommendation at Scale Using Embeddings and Hopsworks
  • Real Time Cost Reduction Monitoring and Alerting
  • Realize the Promise of Streaming with the Databricks Lakehouse Platform
  • Recent Parquet Improvements in Apache Spark
  • Rethinking Orchestration as Reconciliation Software Defined Assets in Dagster
  • Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core
  • Scalable XGBoost on GPU Clusters
  • Scaling AI Workloads with the Ray Ecosystem
  • Scaling Your Workloads with Databricks Serverless
  • Scaling Deep Learning on Databricks
  • Scaling ML at CashApp with Tecton
  • Scaling Privacy Practical Architectures and Experiences
  • Security Best Practices for Lakehouse
  • Self Serve Automated and Robust CDC pipeline using AWS DMS DynamoDB Streams and Databricks Delta
  • Serverless Kafka and Apache Spark in a Multi Cloud Data Lakehouse Architecture
  • Serving Near Real Time Features at Scale
  • Setting up On Shelf Availability Alerts at Scale with Databricks and Azure
  • Simplify Global DataOps and MLOps Using Oktas FIG Automation Library
  • Simplifying Migrations to Lakehouse—the Databricks Way
  • Smart Manufacturing Real time Process Optimization with Databricks
  • So Fresh and So Clean Learn How to Build Real Time Warehouses on Lakehouse
  • Sound Data Engineering in Rust From Bits to DataFrames
  • Spark Data Source V2 Performance Improvement Aggregate Push Down
  • Spark Inception Exploiting the Apache Spark REPL to Build Streaming Notebooks
  • Spline Central Data Lineage Tracking Not Only For Spark
  • State of the Art Natural Language Processing with Apache Spark NLP
  • Streaming ML Enrichment Framework Using Advanced Delta Table Features
  • Survey of Production ML Tech Stacks
  • Technical and Tactical Football Analysis Through Data
  • The Databricks Notebook Front Door of the Lakehouse
  • The Modern Metadata Platform What Why and How
  • The Road to a Robust Data Lake 0
  • The Semantics of Biology Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing on Apache Spark teblog.pdf
  • Time Series Forecasting with PyCaret
  • Tools for Assisted Apache Spark Version Migrations, From 2.1 to 3.2+
  • Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges
  • Turning Big Biology Data into Insights on Disease The Power of Circulating Biomarkers
  • Turning Fan Data Into an Asset
  • UIMeta A 10X Faster Cloud Native Spark History Server
  • Unifying Data Science and Business
  • Vision AI Animal Health Industry Use Cases Using Databricks on Azure
  • What to Do When Your Job Goes OOM in the Night Flowcharts
  • X FIPE eXtended Feature Impact for Prediction Explanation
  • You Have BI Now What Activate Your Data
  • dbt Machine Learning What Makes a Great Baton Pass
  • dbt and Python Better Together
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Data + AI Summit 2022 PPT 下载】(https://www.iteblog.com/archives/10189.html)
喜欢 (1)
分享 (0)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!