You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
A. Nothing; the worker node will automatically join the cluster when the DataNode daemon is started.
B. Without creating a dfs.hosts file or making any entries, run the command hadoop dfsadmin refreshHadoop on the NameNode
C. Create a dfs.hosts file on the NameNode, add the worker node's name to it, then issue the command hadoop dfsadmin refreshNodes on the NameNode
D. Restart the NameNode
Given:
You want to clean up this list by removing jobs where the state is KILLED. What command you enter?
A. Yarn application kill application_1374638600275_0109
B. Yarn rmadmin refreshQueue
C. Yarn application refreshJobHistory
D. Yarn rmadmin kill application_1374638600275_0109
Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to a cluster?
A. Nothing, other than ensuring that DNS (or /etc/hosts files on all machines) contains am entry for the new node.
B. Restart the NameNode and ResourceManager deamons and resubmit any running jobs
C. Increase the value of dfs.number.of.needs in hdfs-site.xml
D. Add a new entry to /etc/nodes on the NameNode host.
E. Restart the NameNode daemon.
You are the hadoop fs put command to add a file "sales.txt" to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of this file in this situation/
A. The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file's replication doesn't fall two)
B. This file will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster's replication values are restored
C. The file will remain under-replicated until the administrator brings that nodes back online
D. The file will be re-replicated automatically after the NameNode determines it is under replicated based on the block reports it receives from the DataNodes
You are configuring your cluster to run HDFS and MapReduce v2 (MRv2) on YARN. Which daemons need to be installed on your clusters master nodes? (Choose Two)
A. ResourceManager
B. DataNode
C. NameNode
D. JobTracker
E. TaskTracker
F. HMaster
You have a Hadoop cluster running HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run on the cluster and submit jobs from the command line of the gateway machine?
A. Install the impslad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node
B. Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine
C. Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalog daemon on one of the nodes in the cluster
D. Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine
E. Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster, and the impala shell on your gateway machine
Which YARN daemon or service negotiates map and reduce Containers from the Scheduler, tracking their status and monitoring for progress?
A. ResourceManager
B. ApplicationMaster
C. NodeManager
D. ApplicationManager
Your cluster's mapped-site.xml includes the following parameters
And your cluster's yarn-site.xml includes the following parameters
What is the maximum amount of virtual memory allocated for each map before YARN will kill its Container?
A. 4 GB
B. 17.2 GB
C. 24.6 GB
D. 8.2 GB
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin failover nn01 nn02
A. nn02 becomes the standby NameNode and nn01 becomes the active NameNode
B. nn02 is fenced, and nn01 becomes the active NameNode
C. nn01 becomes the standby NamNode and nn02 becomes the active NAmeNode
D. nn01 is fenced, and nn02 becomes the active NameNode
Which YARN process runs as "controller O" of a submitted job and is responsible for resource requests?
A. ResourceManager
B. NodeManager
C. JobHistoryServer
D. ApplicationMaster
E. JobTracker
F. ApplicationManager
You observe that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. Decrease the io.sort.mb value to 0
B. Increase the io.sort.mb to 1GB
C. For 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records
You want a node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do?
A. Delete the /swapfile file on the node
B. Set vm.swappiness to o in /etc/sysctl.conf
C. Set the ram.swap parameter to o in core-site.xml
D. Delete the /etc/swap file on the node
E. Delete the /dev/vmswap file on the node
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?
A. We cannot say; the number of Mappers is determined by the RsourceManager
B. We cannot say; the number of Mappers is determined by the ApplicationManager
C. We cannot say; the number of Mappers is determined by the developer
D. 30
E. 3
F. 10
What processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes.
A. You must modify the configuration file on each of the six DataNode machines.
B. You must restart the NameNode daemon to apply the changes to the cluster
C. You must restart all six DatNode daemon to apply the changes to the cluste
D. You don't need to restart any daemon, as they will pick up changes automatically
E. You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes.
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
A. fstime
B. VERSION
C. Fsimage_N (Where N reflects all transactions up to transaction ID N)
D. Edits_N-M (Where N-M specifies transactions between transactions ID N and transaction ID N)