UNDERSTANDING THE MAP-REDUCE PROGRAMMING MODEL AND SYSTEM When talking about Map-Reduce, one may mean the Map-Reduce programming model (the narrow sense), or the Map-Reduce system (the wider sense) that provides the support for the programming model, as well as a rich set of system services and advanced features. Understanding the programming model is sufficient to write a semantically correct program. However, from our experiences, in order to write a program that performs well on a datacenter-scale system, it is often necessary to understand the map-reduce system. In the following, we describe the programming model (Section 2.1), system services (Section 2.2), and advanced features (Section 2.3), based on the original Map-Reduce paper [5] and the open source Hadoop implementation [11].2.1 Programming Model Figure 1 illustrates the Map-Reduce programming model using a word counting example. A programmer implements two functions, a Map function and a Reduce function. Their semantics are shown at the top of Figure 1. Conceptually, the input to the Map-Reduce computation consists of a list of (in_key, in_value) pairs. Each Map function call takes a pair of (in_key, in_value) as input and produces zero or more pairs of intermediate key-value, (mid_key, mid_value). Then, the system automatically performs a group-by operation on the intermediate mid_key. After that, each Reduce function call processes a group, i.e. a mid_key and a list of mid_values, and produces zero or more output results. In the word counting example shown in Figure 1, the Map function generates a (word, 1) as an intermediate result for each encoun-tered word, where mid_value = 1. The Reduce function simply sums up the list of mid_values to obtain the word count.39185
System Services The system provides three categories of services to facilitate the running of a Map-Reduce program across a large number of machines, as shown in Figure 2. (i) Distributed file system: the input and output are stored in a distributed file system to be globally accessible. (ii) Automatically distributing tasks: the system instantiates. Map and Reduce tasks across a large number of machines, partitions and assigns the input to inpidual Map tasks, then coordinates the copying of intermediate results (stored on local disks) from the machines running Map tasks to machines running Reduce tasks. A typical optimization is to co-locate a Map task to a machine (or rack) that also holds a copy of the assigned input partition in the distributed file system. (iii) Fault tolerance: the system monitors the task executions and re-executes tasks if they fail (due to machine crashes etc.). Since the input to a Reduce task may come from any of the Map tasks, it is necessary to wait for all Map tasks to finish before launching Reduce tasks. Near the end of the Map or the Reduce phase, the system schedules backup executions of the remaining in-progress tasks to protect against “stragglers”, which are machines taking anunusually long time to complete tasks due to permanent or transient hardware or software problems.
Advanced Features Figure 2 shows the many advanced components surrounding and supporting the Mapper and the Reducer computations. Typically,the default implementations of these components can be replaced with customized implementations. InputFormat extracts the input record (in_key, in_value) from the input (default: extracting a line from a text file). This gives part of the functionality of a database access method. Partitioner computes a Reduce task ID from mid_key, determining which local file to append an intermediate result (default: hash code(mid_key) modulo R). One may perform range based partitioning instead. The local files containing intermediate results are copied to the machine running the corresponding Reduce task. Then Sorter sorts the intermediate results to complete the group-by operation (default: merge sort). After the reduce computation, OutputFormat formats output results for writing to output files (default: out_key tab out_value per line).The network traffic of copying can be reduc by implementing Combiner, which is a partial reduce function, to be called on the intermediate results local on inpidual Map task machines. This is an optimization for reduce operations that are associative and commutative. For example, in the above word counting example, we can use the Reduce function as the Combiner.
上一篇:java异常错误处理英文文献和中文翻译
下一篇:评价推荐系统英文文献和中文翻译

移动码头的泊位分配问题英文文献和中文翻译

纤维素增强的淀粉-明胶聚...

多极化港口系统的竞争力外文文献和中文翻译

阻尼减震平台的设计英文文献和中文翻译

超精密自由抛光的混合机...

旋转式伺服电机的柔性电...

过程约束优化数控机床的...

STC89C52单片机NRF24L01的无线病房呼叫系统设计

浅论职工思想政治工作茬...

从政策角度谈黑龙江對俄...

浅谈高校行政管理人员的...

酵母菌发酵生产天然香料...

上海居民的社会参与研究

提高教育质量,构建大學生...

AES算法GPU协处理下分组加...

基于Joomla平台的计算机学院网站设计与开发

压疮高危人群的标准化中...