When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?
A. When the types of the reduce operation's input key and input value match the types of the reducer's output key and output value and when the reduce operation is both communicative and associative.
B. When the signature of the reduce method matches the signature of the combine method.
C. Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.
D. Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.
E. Never. Combiners and reducers must be implemented separately because they serve different purposes.
Indentify which best defines a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.
A combiner reduces:
A. The number of values across different keys in the iterator supplied to a single reduce method call.
B. The amount of intermediate data that must be transferred between the mapper and reducer.
C. The number of input files a mapper must process.
D. The number of output files a reducer must produce.
What types of algorithms are difficult to express in MapReduce v1 (MRv1)?
A. Algorithms that require applying the same mathematical function to large numbers of individual binary records.
B. Relational operations on large amounts of structured and semi-structured data.
C. Algorithms that require global, sharing states.
D. Large-scale graph algorithms that require one-step link traversal.
E. Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).
Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.
A. TaskTracker
B. NameNode
C. DataNode
D. JobTracker
E. Secondary NameNode
To use a lava user-defined function (UDF) with Pig what must you do?
A. Define an alias to shorten the function name
B. Pass arguments to the constructor of UDFs implementation class
C. Register the JAR file containing the UDF
D. Put the JAR file into the userandapos;s home folder in HDFS
Which one of the following statements is true about a Hive-managed table?
A. Records can only be added to the table using the Hive INSERT command.
B. When the table is dropped, the underlying folder in HDFS is deleted.
C. Hive dynamically defines the schema of the table based on the FROM clause of a SELECT query.
D. Hive dynamically defines the schema of the table based on the format of the underlying data.
To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?
A. Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper.
B. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.
C. Place the data file in the DataCache and read the data into memory in the configure method of the mapper.
D. Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper.
Review the following data and Pig code:
What command to define B would produce the output (M,62,95l02) when invoking the DUMP operator on B?
A. B = FILTER A BY (zip = = '95102' AND gender = = M");
B. B= FOREACH A BY (gender = = 'M' AND zip = = '95102');
C. B = JOIN A BY (gender = = 'M' AND zip = = '95102');
D. B= GROUP A BY (zip = = '95102' AND gender = = 'M');
Given the following Hive commands:
Which one of the following statements Is true?
A. The file mydata.txt is copied to a subfolder of /apps/hive/warehouse
B. The file mydata.txt is moved to a subfolder of /apps/hive/warehouse
C. The file mydata.txt is copied into Hive's underlying relational database 0.
D. The file mydata.txt does not move from Its current location in HDFS