Certbus > EMC > Data Scientist > E20-065 > E20-065 Online Practice Questions and Answers

E20-065 Online Practice Questions and Answers

Questions 4

Given an input vector of features, a Random Forests model performs a classification task and ends in a tie. How does the model handle this outcome?

A. The model will be rebuilt

B. A winner is chosen at random

C. The tree that caused the tie is discarded

D. One more tree is added to the forest

Browse 66 Q&As
Questions 5

A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datasets.

Why is this a problem?

A. MapReduce works best on unstructured data

B. There is no problem; MapReduce accommodates all the data

C. MapReduce can only parse one file at a time.

D. MapReduce is not ideal when the processing of one dataset depends on another.

Browse 66 Q&As
Questions 6

What is a characteristic of stop words?

A. Used in term frequency analysis

B. Include words such as "a", "an", and "the"

C. Meaningful words requiring a parser to stop and examine them

D. Don't occur often in text

Browse 66 Q&As
Questions 7

What is the most likely reason for an HBase table to contain millions of columns?

A. Data is imported from a relational database table

B. Data is stored in the column qualifier

C. There are thousands of columns families

D. The column names are randomly generated

Browse 66 Q&As
Questions 8

Which metric would be most helpful in identifying a node that may cause network disruption if the node were removed?

A. Degree

B. Closeness

C. Betweenness

D. PageRank

Browse 66 Q&As
Questions 9

A hotel chain runs a simul-ation on room pricing. They want to estimate revenue, per hotel, within +/- $10 with 95% confidence (Za/2=1.96). The estimated revenue standard deviation is $5000 based on previous booking data.

What is the optimal number of simulation trials to run?

A. A 32-bit operating system was used

B. The same number of trials was used

C. A linear congruential generator (LCG) was used (or pseudo-random number generation

D. Different seeds tor the random number generator were used.

Browse 66 Q&As
Questions 10

What is NOT a category of a NoSQL data store?

A. Columnar

B. Document

C. Key/Value

D. Flat File

Browse 66 Q&As
Questions 11

What is a typical use of a UDF in Pig?

A. Creating functionality outside of what is provided by the built-in functions

B. Providing Functional access to user-defined data in HDFS

C. Providing advanced analytics to Hadoop

D. Providing an interface from Pig to Microsoft Excel for easier data manipulation

Browse 66 Q&As
Questions 12

You develop a Python script "logisticpy" to evaluate the logistic function denoted as f(y) for a given value y that includes the following Pig code:

Register 'logistic.py' using jython as udf;

z = FOREACH y GENERATE $0, udf.logistic ($0);

DUMP z;

What is the expected output when the Pig code is executed?

A. 0

B. Jython is not a supported language

C. Value of f(y) for ally

D. Tuples (y, f(y))

Browse 66 Q&As
Questions 13

You conduct a TFIDF analysis on 3 documents containing raw text and derive TFIDF ("data", document y) = 1.908. You know that the term "data" only appears in document 2.

What is the TF of "data" in document 2?

A. 2 based on the following reasoning: TFIDF = TF1DF = 1 908 You then know that IDF will equal LOG (32)=0.954 Therefore, TFIDF=TF*0.954 = 1.908 TF will then round to 2

B. 4 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal LOG (3/1 )=0.477 Therefore, TFIDF=TF'0 477 = 1.908 TF will then round to 4

C. 6 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal 3/1=3 Therefore, TFIDF=TF/3 = 1.908 TF will then round to 6

D. 11 based on the following reasoning: TFIDF = TF1DF = 1908 You then know that IDF will equal LOG(3/2)=0.176 Therefore, TFIDF=TF"0.176 = 1.908 TF will then round to 11

Browse 66 Q&As
Questions 14

Which problem type is best suited for simulation?

A. One with a few. non-random input variables

B. One that has a closed-form solution

C. One with numerous, non-random Input-variables

D. One that compares "what-if scenarios

Browse 66 Q&As
Questions 15

In multinomial logistic regression, what is used to calculate the probability of outcome occurring?

A. Logistic function applied to a linear combination of the input and outcome variables

B. Linear regression applied to a combination of input variables

C. Linear regression applied to a combination of input and outcome variables

D. Logistic function applied to a linear combination of the input variables

Browse 66 Q&As
Questions 16

What are the major components of the YARN architecture?

A. ResourceManager and NodeManager

B. Task Tracker and NameNode

C. HDFS, Tez, and Spark

D. Avro, ZooKeeper, and HDFS

Browse 66 Q&As
Questions 17

If two of the communities are re-designated to be one community, how does that change the network characteristics?

Refer to the exhibit.

A. Neighborhood overlap would increase

B. Network diameter would decrease

C. Modularity would increase

D. Modularity would decrease

Browse 66 Q&As
Questions 18

What best describes the meaning behind the phrase "Six Degrees of Separation'"?

A. Ability to use about six hops to reach any other node in an extremely large social network

B. Erdos number of all scholars having written papers with Paul Erdos

C. Maximum number of edges between nodes in a graph with a diameter of six

D. Typical distance between nodes that are connected by triadic closure

Browse 66 Q&As
Exam Code: E20-065
Exam Name: Advanced Analytics Specialist Exam for Data Scientists
Last Update: Apr 27, 2024
Questions: 66 Q&As

PDF

$45.99

VCE

$49.99

PDF + VCE

$59.99