Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to a cluster?
A. Nothing, other than ensuring that DNS (or /etc/hosts files on all machines) contains am entry for the new node.
B. Restart the NameNode and ResourceManager deamons and resubmit any running jobs
C. Increase the value of dfs.number.of.needs in hdfs-site.xml
D. Add a new entry to /etc/nodes on the NameNode host.
E. Restart the NameNode daemon.
You have a 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in you cluster. What should you do?
A. Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum
B. Configure the cluster's disk drives with an appropriate fault tolerant RAID level
C. Run the ResourceManager on a different master from the NameNode in the order to load share HDFS metadata processing
D. Run a Secondary NameNode on a different master from the NameNode in order to load provide automatic recovery from a NameNode failure
E. Set an HDFS replication factor that provides data redundancy, protecting against failure
You are configuring your cluster to run HDFS and MapReduce v2 (MRv2) on YARN. Which daemons need to be installed on your clusters master nodes? (Choose Two)
A. ResourceManager
B. DataNode
C. NameNode
D. JobTracker
E. TaskTracker
F. HMaster
You have converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to a MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs. A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developer to implement?
A. Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing p mapreduce.job.reduce-2 will specify 2 reduce tasks.
B. In YARN, the ApplicationMaster is responsible for requesting the resources required for a specific job. Thus, executing p yarn.applicationmaster.reduce.tasks-2 will specify that the ApplicationMaster launch two task containers on the worker nodes.
C. In YARN, resource allocation is a function of megabytes of memory in multiple of 1024mb. Thus, they should specify the amount of memory resource they need by executing D mapreduce.reduce.memory-mp-2040
D. In YARN, resource allocation is a function of virtual cores specified by the ApplicationMaster making requests to the NodeManager where a reduce task is handled by a single container (and this a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing p yarn.nodemanager.cpu-vcores=2
E. MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of "tasks" into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.
Your cluster's mapped-site.xml includes the following parameters
And your cluster's yarn-site.xml includes the following parameters
What is the maximum amount of virtual memory allocated for each map before YARN will kill its Container?
A. 4 GB
B. 17.2 GB
C. 24.6 GB
D. 8.2 GB
You want to understand more about how users browse you public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server logs into your Hadoop cluster for analysis?
A. Sample the web server logs web servers and copy them into HDFS using curl
B. Ingest the server web logs into HDFS using Flume
C. Import all users clicks from your OLTP databases into Hadoop using Sqoop
D. Write a MApReduce job with the web servers from mappers and the Hadoop cluster nodes reducers
E. Channel these clickstream into Hadoop using Hadoop Streaming
Your Hadoop cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a function cluster?
A. Yes. The daemon will receive data from the NameNode to run Map tasks
B. Yes. The daemon will get data from another (non-local) DataNode to run Map tasks
C. Yes. The daemon will receive Reduce tasks only
Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts property in the NameNode's configuration file. What results?
A. No new nodes can be added to the cluster until you specify them in the dfs.hosts file
B. Presented with a blank dfs.hosts property, the NameNode will permit DatNode specified in mapred.hosts to join the cluster
C. Any machine running the DataNode daemon can immediately join the cluster
D. The NameNode will update the dfs.hosts property to include machine running DataNode daemon on the next NameNode reboot or with the command dfsadmin -refreshNodes
A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass. She runs command:
hadoop jar j.jar DriverClass /data/input/data/output
The error message returned includes the line:
PrivilegedActionException as:training (auth:SIMPLE) cause.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exits: file :/data/input
What is the cause of the error?
A. The Hadoop configuration files on the client do not point to the cluster
B. The directory name is misspelled in HDFS
C. The name of the driver has been spelled incorrectly on the command line
D. The output directory already exists
E. The user is not authorized to run the job on the cluster
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
A. fstime
B. VERSION
C. Fsimage_N (Where N reflects all transactions up to transaction ID N)
D. Edits_N-M (Where N-M specifies transactions between transactions ID N and transaction ID N)