Certbus > EMC > Data Scientist > E20-007 > E20-007 Online Practice Questions and Answers

E20-007 Online Practice Questions and Answers

Questions 4

Which process in text analysis can be used to reduce dimensionality?

A. Stemming

B. Parsing

C. Digitizing

D. Sorting

Browse 198 Q&As
Questions 5

Which key role for a successful analytic project can consult and advise the project team on the value of end results and how these will be used on a day-to-day basis?

A. Business User

B. Project Manager

C. Data Scientist

D. Business Intelligence Analyst

Browse 198 Q&As
Questions 6

When would you use a Wilcoxson Rank Sum test?

A. When you cannot make an assumption about the distribution of the populations

B. When the data can easily be sorted

C. When the populations represent the sums of other values

D. When the data cannot easily be sorted

Browse 198 Q&As
Questions 7

You submit a MapReduce job to a Hadoop cluster. However, you notice that although the job was

successfully submitted, it is not completing.

What should be done to identify the issue?

A. Ensure TaskTracker is running

B. Ensure JobTracker is running

C. Ensure NameNode is running

D. Ensure DataNode is running

Browse 198 Q&As
Questions 8

While having a discussion with your colleague, this person mentions that they want to perform K-means clustering on text file data stored in HDFS.

Which tool would you recommend to this colleague?

A. Mahout

B. HBase

C. Scribe

D. Sqoop

Browse 198 Q&As
Questions 9

What would be considered "Big Data"?

A. An OLAP Cube containing customer demographic information about 100, 000, 000 customers

B. Daily Log files from a web server that receives 100, 000 hits per minute

C. Aggregated statistical data stored in a relational database table

D. Spreadsheets containing monthly sales data for a Global 100 corporation

Browse 198 Q&As
Questions 10

Refer to the exhibit.

You are using K-means clustering to classify customer behavior for a large retailer. You need to determine the optimum number of customer groups. You plot the within-sum-of- squares (wss) data as shown in the exhibit. How many customer groups should you specify?

A. 2

B. 3

C. 4

D. 8

Browse 198 Q&As
Questions 11

What does R code nv <- v[v < 1000] do?

A. Selects the values in vector v that are less than 1000 and assigns them to the vector nv

B. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than

C. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv

D. Selects values of vector v less than 1000, modifies v, and makes a copy to nv

Browse 198 Q&As
Questions 12

Refer to the exhibit.

You have run a linear regression model against your data, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?

A. The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and refit to get a better idea of the model's quality over typical data.

B. The R-squared is good. The model should perform well.

C. The extreme-valued outliers may negatively affect the model's performance. Remove them to see if the R-squared improves over typical data.

D. The observations seem to come from two different populations, but this model fits them both equally well.

Browse 198 Q&As
Questions 13

Refer to the exhibit.

Click on the calculator icon in the upper left corner. You are going into a meeting where you know your manager will have a question on your dataset -- specifically relating to customers that are classified as renters with good credit status.

In order to prepare for the meeting, you create a rule: RENTER => GOOD CREDIT. What is the confidence of the rule?

A. 63%

B. 41%

C. 18%

D. 73%

Browse 198 Q&As
Questions 14

Which data asset is an example of quasi-structured data?

A. Webserver log

B. XML data file

C. Database table

D. News article

Browse 198 Q&As
Questions 15

What describes a true property of a Logistic Regression method?

A. Robust with redundant variables and correlated variables

B. Handles missing values well

C. Works well with discrete variables that have many distinct values

D. Works well with variables that affect the outcome in a discontinuous way

Browse 198 Q&As
Questions 16

Refer to the exhibit.

Click on the calculator icon in the upper left corner. You are given a list of pre-defined association rules:

A. RENTER => BAD CREDIT

B. RENTER => GOOD CREDIT

C. HOME OWNER => BAD CREDIT

D. HOME OWNER => GOOD CREDIT

E. FREE HOUSING => BAD CREDIT

F. FREE HOUSING => GOOD CREDIT For your next analysis, you must limit your dataset based on rules with confidence greater than 60%. Which of the rules will be kept in the analysis?

A. Rules B and D

B. Rules A and F

C. Rules C and E

D. Rules D and E

Browse 198 Q&As
Questions 17

Which word or phrase completes the statement?

Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to ______________ .

A. Optimization and Predictive Modeling

B. Alerts and Queries

C. Structured Data and Data Sources

D. Sales and profit reporting

Browse 198 Q&As
Questions 18

You are performing a marketing analysis on baskets using the Apriori algorithm. Which measure is a ratio that describes how many more times two items are present together than would be expected if those two items are statistically independent?

A. Lift

B. Leverage

C. Support

D. Confidence

Browse 198 Q&As
Exam Code: E20-007
Exam Name: Data Science and Big Data Analytics
Last Update: Apr 15, 2024
Questions: 198 Q&As

PDF

$45.99

VCE

$49.99

PDF + VCE

$59.99