Certbus > Hortonworks > HCAHD > APACHE-HADOOP-DEVELOPER > APACHE-HADOOP-DEVELOPER Online Practice Questions and Answers

APACHE-HADOOP-DEVELOPER Online Practice Questions and Answers

Questions 4

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

A. When the types of the reduce operation's input key and input value match the types of the reducer's output key and output value and when the reduce operation is both communicative and associative.

B. When the signature of the reduce method matches the signature of the combine method.

C. Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.

D. Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.

E. Never. Combiners and reducers must be implemented separately because they serve different purposes.

Browse 108 Q&As
Questions 5

What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

A. You will not be able to compress the intermediate data.

B. You will longer be able to take advantage of a Combiner.

C. By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order.

D. There are no concerns with this approach. It is always advisable to use multiple reduces.

Browse 108 Q&As
Questions 6

Which one of the following statements is FALSE regarding the communication between DataNodes and a federation of NameNodes in Hadoop 2.0?

A. Each DataNode receives commands from one designated master NameNode.

B. DataNodes send periodic heartbeats to all the NameNodes.

C. Each DataNode registers with all the NameNodes.

D. DataNodes send periodic block reports to all the NameNodes.

Browse 108 Q&As
Questions 7

You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high- level programming language like Python. Which format should you use to store this data in HDFS?

A. SequenceFiles

B. Avro

C. JSON

D. HTML

E. XML

F. CSV

Browse 108 Q&As
Questions 8

You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?

A. Increase the block size on all current files in HDFS.

B. Increase the block size on your remaining files.

C. Decrease the block size on your remaining files.

D. Increase the amount of memory for the NameNode.

E. Increase the number of disks (or size) for the NameNode.

F. Decrease the block size on all current files in HDFS.

Browse 108 Q&As
Questions 9

Which two of the following are true about this trivial Pig program' (choose Two)

A. The contents of myfile appear on stdout

B. Pig assumes the contents of myfile are comma delimited

C. ABC has a schema associated with it

D. myfile is read from the user's home directory in HDFS

Browse 108 Q&As
Questions 10

Which process describes the lifecycle of a Mapper?

A. The JobTracker calls the TaskTracker's configure () method, then its map () method and finally its close () method.

B. The TaskTracker spawns a new Mapper to process all records in a single input split.

C. The TaskTracker spawns a new Mapper to process each key-value pair.

D. The JobTracker spawns a new Mapper to process all records in a single file.

Browse 108 Q&As
Questions 11

Indentify which best defines a SequenceFile?

A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects

B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects

C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.

D. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.

Browse 108 Q&As
Questions 12

Which Two of the following statements are true about hdfs? Choose 2 answers

A. An HDFS file that is larger than dfs.block.size is split into blocks

B. Blocks are replicated to multiple datanodes

C. HDFS works best when storing a large number of relatively small files

D. Block sizes for all files must be the same size

Browse 108 Q&As
Questions 13

Which one of the following statements is false about HCatalog?

A. Provides a shared schema mechanism

B. Designed to be used by other programs such as Pig, Hive and MapReduce

C. Stores HDFS data in a database for performing SQL-like ad-hoc queries

D. Exists as a subproject of Hive

Browse 108 Q&As
Questions 14

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

A. Increase the parameter that controls minimum split size in the job configuration.

B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.

C. Set the number of mappers equal to the number of input files you want to process.

D. Write a custom FileInputFormat and override the method isSplitable to always return false.

Browse 108 Q&As
Questions 15

Which describes how a client reads a file from HDFS?

A. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.

D. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.

Browse 108 Q&As
Questions 16

Consider the following two relations, A and B.

Which Pig statement combines A by its first field and B by its second field?

A. C = DOIN B BY a1, A by b2;

B. C = JOIN A by al, B by b2;

C. C = JOIN A a1, B b2;

D. C = JOIN A SO, B $1;

Browse 108 Q&As
Questions 17

MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.

A. Heath states checks (heartbeats)

B. Resource management

C. Job scheduling/monitoring

D. Job coordination between the ResourceManager and NodeManager

E. Launching tasks

F. Managing file system metadata

G. MapReduce metric reporting H. Managing tasks

Browse 108 Q&As
Questions 18

When is the earliest point at which the reduce method of a given Reducer can be called?

A. As soon as at least one mapper has finished processing its input split.

B. As soon as a mapper has emitted at least one record.

C. Not until all mappers have finished processing all records.

D. It depends on the InputFormat used for the job.

Browse 108 Q&As
Exam Name: Hadoop 2.0 Certification exam for Pig and Hive Developer
Last Update: Apr 26, 2024
Questions: 108 Q&As

PDF

$45.99

VCE

$49.99

PDF + VCE

$59.99