文章目录
下面论文均为大数据和分布式比较经典的论文,包括:CAP、BASE、2PC、一致性协议、一致性哈希、逻辑时钟、Leases 等。如果大家还有比较好的论文,欢迎在下面评论。
分布式理论
- Time, Clocks, and the Ordering of Events in a Distributed System
- Reaching Agreement in the Presence of Faults
- The Byzantine General Problem
- (CAP) Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
- (2PC) Concurrency Control and Recovery in Database Systems
- BASE: An Acid Alternative
- An Overview of Clock Synchronization
- Epidemic Algorithms for Replicated Database Maintenance
- Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
- Weighted Voting for Replicated Data
- A Quorum-Consensus Replication Method for Abstract Data Types
- Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
一致性
- (Paxos) The Part-Time Parliament
- Paxos Made Simple
- Paxos Made Practical
- Paxos Made Live - An Engineering Perspective
- Revisiting the Paxos algorithm
- Distributed Snapshots: Determining Global States of Distributed Systems
- Lightweight Asynchronous Snapshots for Distributed Dataflows
- ZooKeeper's atomic broadcast protocol: Theory and practice
- (Raft) In Search of an Understandable Consensus Algorithm
- The Google File System
- MapReduce: Simplified Data Processing on Large Clusters
- Bigtable: A Distributed Storage System for Structured Data
- The Chubby lock service for loosely-coupled distributed systems
- Large-scale Incremental Processing Using Distributed Transactions and Notifications
- Dremel: Interactive Analysis of Web-Scale Datasets
- Omega: flexible, scalable schedulers for large compute clusters
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale
- Large-scale cluster management at Google with Borg
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications
- Spanner: Google's Globally-Distributed Database
- F1: A Distributed SQL Database That Scales
- Pregel: A System for Large-Scale Graph Processing
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
通用计算框架
- MapReduce: Simplified Data Processing on Large Clusters
- Hive - A Warehousing Solution Over a Map-Reduce Framework
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Spark: Cluster Computing with Working Sets
Streaming
- Storm @Twitter
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale
- Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
- Apache Flink™: Stream and Batch Processing in a Single Engine
- Twitter Heron: Stream Processing at Scale
- Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
- S4: Distributed Stream Computing Platform
消息队列
KV Database & Database
- Bigtable: A Distributed Storage System for Structured Data
- Dynamo: Amazon's Highly Available Key-value Store
- Cassandra - A Decentralized Structured Storage System
- Serving Large-scale Batch Computed Data with Project Voldemort
- Schema-Agnostic Indexing with Azure DocumentDB
Schedulers
- Column-Stores vs. Row-Stores: How Different Are They Really?
- Apache Hadoop YARN: Yet Another Resource Negotiator
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling
Storages
- The Google File System
- The Hadoop Distributed File System
- Column-Stores vs. Row-Stores: How Different Are They Really?
- Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
- RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
- (ORC) Major Technical Advancements in Apache Hive
- Erasure Codes for Storage Systems
协调
- The Chubby lock service for loosely-coupled distributed systems
- ZooKeeper: Wait-free coordination for Internet-scale systems
其他
- GraphX: Graph Processing in a Distributed Dataflow Framework
- MLlib: Machine Learning in Apache Spark
- Shark: SQL and Rich Analytics at Scale
- Spark SQL: Relational Data Processing in Spark
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【大数据和分布式经典论文汇总】(https://www.iteblog.com/archives/2021.html)