Spark: Distributed Computing Framework
- Spark aims to enable efficient distributed computing for applications that heavily utilize intermediate data, such as iterative algorithms and queries. 1m33s
- Resilient Distributed Datasets (RDDs) are the key abstraction in Spark, representing read-only, ordered collections of records. 2m56s
- Spark optimizes for locality through techniques like fusion and tiling to enhance parallel performance. 4m1s
- Transformations can have wide dependencies that require communication between nodes in a system, for example, a group by key operation. 7m41s
- If data is partitioned using the same hash partitioner, the runtime system can detect narrow dependencies and perform fusion, reducing communication overhead. 11m2s
- Spark maintains fault tolerance through lineage, which is a log of transformations applied to RDDs, allowing for data recreation from persistent storage. 12m11s
- If a node crashes during computation, the system recovers by rerunning the log of operations to recreate lost data partitions. This leverages data replication in HDFS and the coarse-grained nature of the operation log. 15m45s
- Spark offers significant performance improvements, often orders of magnitude faster than Hadoop, due to its use of in-memory processing and reduced reliance on disk access for iterative tasks. 18m30s
- Spark's ecosystem includes domain-specific frameworks like Spark SQL for database processing, Spark MLlib for machine learning, and Spark GraphX for graph operations, all built upon the underlying RDD abstraction. 20m30s
- Spark can be extended beyond transformations and actions to perform tasks like graph analysis and database operations. 23m26s
- If data fits in memory, using a distributed system like Spark is not recommended due to overheads, as a single system will be more efficient. 26m27s
Cache Memory: Exploiting Locality
- Modern processors dedicate a significant portion of their chip area to cache (30% or more) to enhance performance by exploiting locality. 28m31s
- Cache lines are used to exploit spatial locality in program behavior, which is the tendency to access memory locations that are close together. 29m58s
- The Intel Skylake chip, found in the myth machines, has a 32 kilobyte L1 data cache that is 8-way set associative, meaning that any given cache line can only be stored in one of eight places in the cache. 34m59s
- Conflict misses occur when the limited associativity of a cache prevents a cache line from being stored in a particular location, even if there are other available locations in the cache. 36m33s
- Set-associative caches are organized into sets, which can be visualized as buckets, with each set containing a fixed number of cache lines. 37m26s
- As cache size increases, the set associativity does not always increase proportionally. 38m40s
- A cache line consists of data and metadata. The metadata includes a tag, which stores the memory address of the data, and a dirty bit, indicating whether the data in the cache has been modified. 39m31s
- In a write-back cache, data is written to the cache first and only later written to the main memory. In contrast, a write-through cache writes data to both the cache and main memory simultaneously. 41m31s
- Write-allocate, used with write-back caches, fetches the entire cache line from main memory upon a write miss and then writes the data. 43m39s
- Right back caches write data to the cache and set a dirty bit, while right through caches write data to both the cache and memory simultaneously, eliminating the need for a dirty bit. 44m40s
- The tag in a cache represents the address of a cache line, not just a single variable, allowing for spatial locality. 49m25s
- Higher set associativity in a cache reduces miss rates but increases the complexity of cache lookups. 50m35s
Cache Coherence: Maintaining Data Consistency
- When multiple processors with individual caches are connected to a main memory, a memory coherence problem can occur. 52m52s
- If multiple processors read and write to the same memory address, inconsistencies can arise because updates to one cache might not be reflected in other caches or the main memory. 56m1s
- Determining the "last" value written to a memory address becomes complex in a multi-processor system because simultaneous writes from different processors need a clear order to ensure data consistency. 59m0s
- For any given memory location, access should be serialized to maintain coherence. This means that reads and writes to a specific address should happen in the order issued by the processor, ensuring subsequent reads reflect the last write to that location. 1h0m51s
- Two key invariants are crucial for cache coherence: the single writer multiple reader invariant, which allows only one processor to modify a cache line at a time while permitting multiple readers, and the data value invariant, which ensures that all readers see the last written value during a read-only epoch. 1h2m1s
- While software-based solutions using virtual memory pages are possible, they are slow and prone to issues like false sharing. Hardware-based solutions operating at the cache line level, such as snooping and directory-based systems, offer a more efficient approach to maintaining cache coherence. 1h6m47s
- A single shared cache for multiple processors can lead to a performance bottleneck due to bandwidth limitations and interference between processors. 1h7m52s
- Shared caches can experience both constructive interference, such as pre-fetching data for subsequent iterations, and destructive interference, where one processor's cache misses are caused by another processor. 1h8m46s
Snooping Cache Coherence: Using a Bus
- Snooping cache coherence schemes use an interconnect, such as a bus, to connect processors and their caches, allowing them to communicate and maintain data consistency. 1h11m2s
- Buses have two important properties for cache coherence: they allow only one transaction at a time, providing serialization, and they broadcast all transactions to all connected devices. 1h14m12s
- A bus in computer architecture functions as a broadcast medium, where all processors receive communication from a single source. 1h15m0s
- The MSI protocol, used in cache coherence, utilizes three states: Modified, Shared, and Invalid. 1h18m27s
- The MSI protocol ensures exclusive access for write operations by using a combination of processor operations (processor read, processor write) and bus transactions (bus read, bus read exclusive, bus write back). 1h19m14s








