Wpc 300 Final Exam

Question: In an agile approach of analytics what is the first step of the process?

Answer: Perform business discovery

Question: In an ETL process, data is loaded into a final target database such as:

Answer: Data warehouse

Question: What are the four types of data analytical method?

Answer: Descriptive, explanatory, predictive and prescriptive

Question: Which of the following is an example of secondary data?

Answer: Firm's proprietary data

Question: Which of the following data analysis models use optimization techniques?

Answer: Prescriptive analytics

Question: Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance.

Answer: Prescriptive analytics

Question: Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using?

Answer: Descriptive analytics

Question: Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using?

Answer: Predictive analytics

Question: Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this?

Answer: Prescriptive analytics

Question: Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion?

Answer: All of the answer selections are correct.

Question: Deleting the grid lines in a chart

Answer: Increases the data-ink ratio

Question: When the lie-factor of a graphical chart is more than 1,

Answer: the size of the effect shown in the graph is bigger than the actual effect in the data.

Question: Which are useful principles for data visualization?

Answer: The graph suggests a possible true effect

Question: Which of the following statement(s) about charts is false?

Answer: None of the other answers are false

Question: In order for a chart to have graphical integrity, the lie factor must be:

Answer: close to 1

Question: What best describes the nature of a rose diagram?

Answer: Plots data using a circular historical plot

Question: In for a chart to minimize graphical complexity, the data-ink ratio must be:

Answer: close to 1

Question: Which of the following violates the principle of data visualization?

Answer: The data-ink ratio should be higher than 1

Question: Which of the following statement(s) about charts is true?

Answer: Data ink can sometimes help tell a richer story

Question: Which of the following statements is a reason not to use a table for data visualization?

Answer: Tables cannot easily show trends

Question: Standard deviation of a normal data distribution is a _______.

Answer: measure of data dispersion

Question: The difference between the first and third quartiles is referred to as the ____________.

Answer: interquartile range

Question: Which of the following is an example of a sample?

Answer: The number of IT employees out of all employees working in an office of Google

Question: Which of the following is an example of a measure of dispersion?

Answer: variance

Question: Which of the following describes the standard deviation?

Answer: It is the square root of the variance.

Question: The ________ is the observation that occurs most frequently.

Answer: mode

Question: For a normal distribution mean is _______ to median.

Answer: equal

Question: Which of the following describes a positively skewed histogram?

Answer: a histogram that tails off towards the right

Question: What are the three principles of describing data?

Answer: Center, spread and shape

Question: Which of the following is true for a median?

Answer: For an even number of observations, the median is the mean of the two middle numbers

Question: Which of the following is a difference between the t-distribution and the standard normal (z) distribution?

Answer: The t-distribution has a larger variance than the standard normal distribution

Question: What is the confidence level when the level of significance is 0.07?

Answer: 0.93

Question: The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean?

Answer: 3

Question: In order to reject the null hypothesis, the p-value must be less than the

Answer: Alpha

Question: You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias?

Answer: Anonymously data collection by hiding ASU brand in the survey question.

Question: When sample size increases

Answer: Confidence interval decreases

Question: Which of the following is a continuous random variable?

Answer: The time to complete a specific task

Question: Which of the following is a Type-I error?

Answer: The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.

Question: Which of the following proposition describes an existing theory or belief?

Answer: Null hypothesis

Question: The central limit theorem states that even if the population is not normally distributed, the

Answer: distribution of the sample mean will still be normal when the sample size is large

Question: A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be:

Answer: 6900

Question: Which of the following is true about multi-collinearity?

Answer: It is measured using a measure called variance inflation factor (VIF).

Question: Which of the following assumptions is not true for multiple linear regression?

Answer: There will be a multi-collinearity effect.

Question: A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that:

Answer: They should hire a new statistician.

Question: The value of R-Squared always falls between ________ and ________, inclusive.

Answer: 0 and 1

Question: A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________.

Answer: an independent variable

Question: The unexplained variance in the regression analysis is also known as:

Answer: Residual variance

Question: What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable?

Answer: There is no linear relationship between profit and sales.

Question: Which of the following statement is true based on the following regression equation?IQ = 4.0 + Reading Label * 5.6

Answer: A unit point change in reading label will increase IQ by 5.6 point.

Question: The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true?

Answer: 81% of the variation in the money spent on repairs is explained by the age of the vehicle

Question: A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?

Answer: By utilizing a multiple logistic regression model developed by an in-house analyst

Question: In classification analysis, we are determining the probability of an observation ________.

Answer: To be part of a certain class or not

Question: The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.

Answer: Confusion matrix

Question: In logistic regression, the dependent variable y is defined as:

Answer: Log (p/1-p)

Question: If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?

Answer: Multiple logistic regression

Question: In classification problems, the primary source for accuracy estimation of the model is ________.

Answer: Confusion matrix

Question: In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.

Answer: Logit

Question: Odds ratio is defined as ________, where p is the probability of success.

Answer: p/1-p

Question: Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.

Answer: a binary categorical

Question: In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.

Answer: Training and validation/testing

Question: Which of the following is a definition of distance between two clusters in a complete linkage clustering?

Answer: The distance between the most distant pair of objects, one from each group

Question: Which of the following is true of hierarchical clustering?

Answer: The data partition does not occur in a single step

Question: Which of the following is not an application of clustering analysis?

Answer: Crime prediction analysis

Question: Which of the following is true about k-means clustering

Answer: We choose the value for k before doing the clustering analysis

Question: Which of the following is a false statement?

Answer: To predict sales from transactional data one should perform clustering analysis.

Question: In a cluster analysis, the distance between the clusters should be:

Answer: Maximized

Question: Which of the following is a step of agglomerative hierarchical clustering?

Answer: By joining two clusters that are closest to each other

Question: Which of the following statements below is false about supervised/unsupervised data analysis?

Answer: Data is not labeled for supervised analysis

Question: In the Target story discussed in the lecture, why did Target send the teen daughter maternity ads?

Answer: Target analytics model suggested she was pregnant based on her buying habit

Question: Which of the following category of data mining you would use for Spam filtering of emails?

Answer: Supervised

Question: Which of the following is not a component of the relational database?

Answer: Analysis

Question: Which of the following is a cloud service provider?

Answer: VMWare

Question: When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"?

Answer: Traveler

Question: When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______.


Question: You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table?

Answer: Airport Code

Question: The SQL code to extract only first_name information for all records of the "Actor" table below is:

Answer: SELECT first_name FROM Actor;

Question: _______ ensures that related data exist in parent table before allowing an entry into a child table.

Answer: Referential integrity

Question: "Google Doc" is an example of _______ in a could computing environment.

Answer: SaaS

Question: Which of the following tools help in periodic managerial decision-making?

Answer: OLAP

Question: Which of the following is an important task of a database management system?

Answer: Provides support such as performing maintenance and routine backups.

Question: Which of the following is not a requirement for an ETL architecture?

Answer: data quality

Question: Which of the following is not one of the processes involved in data cleaning?

Answer: Encrypting

Question: Extract function in ETL reads data from

Answer: specified source database

Question: In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______.

Answer: Data warehouse

Question: Which of the following is not a standard practice in "Data Transformation" process of an ETL tool?

Answer: Data extraction from ERP

Question: Which of the following is an ETL vendor?

Answer: Teradata

Question: One of the processes in ETL is

Answer: Load

Question: Data transformation involves

Answer: data splitting and aggregation

Question: The final stage of an ETL process is:

Answer: Load

Question: In data extraction process for an ETL tool, which of the following is not an example of legit data source?

Answer: Competitions' data

Question: A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not.

Answer: spurious correlation

Question: Which of the following is true about A/B testing?

Answer: To increase conversion rate of your website traffic, A/B testing can be beneficial.

Question: After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________.

Answer: Observational Study

Question: A/B testing can help marketers to

Answer: All of the answers are correct

Question: An experiment is said to be double-blinded if _________

Answer: neither the subject nor those working with the subject is aware of who is being given which treatment

Question: Which of the following statements is NOT true about experimental studies to compare two treatments?

Answer: It is not easy to control uncertainties in the comparison..

Question: Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be

Answer: Money

Question: A sample study is mostly done

Answer: to estimate the parameters of the population.

Question: In the experimental design example "IQ Water", students are called _______.

Answer: experimental units

Question: The first step for any kind of A/B testing is

Answer: to develop a test plan for what you want to test.

Question: Over-reliant on the first piece of information is called ____________

Answer: Anchoring bias

Question: Gamblers' fallacy is ____________.

Answer: a clustering illusion

Question: When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________.

Answer: sunk-cost fallacy

Question: Which of the following statements is true?Analytical thinking is not based on factsHeuristic thinking is slowUsing intuition is a way of analytical thinkingExperimentation is a way of analytical thinking

Answer: Experimentation is a way of analytical thinking

Question: Which of the following biases cannot be categorized as a cognitive bias?

Answer: None of the answer selections are correct

Question: A person who is convinced he is gaining admission to Harvard by merely applying is suffering from:

Answer: Overconfidence

Question: When you buy a new car, you value it more than the price you paid because of:

Answer: Endowment effect bias

Question: Which of the following is not a drawback of analytical decision making?

Answer: None of the answer selections are correct

Question: You bought a top of the line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here?

Answer: Bandwagon effect

Question: What kinds of bias could show up when collecting data?

Answer: All of the answer selections are correct

Question: Which of the following statements is not true about artificial neural networks

Answer: In the hidden layer of the networks, input data is hidden

Question: Which of the following is an example of association rule learning?

Answer: How frequently an item set occurs in a transaction

Question: An ideal machine learning process needs

Answer: All other answer are true.

Question: Which of the following examples is not an application of AI?

Answer: Predicting the exam score by scanning the appropriate text book

Question: Which of the following techniques is a modern update of artificial neural networks?

Answer: Deep learning

Question: Which of the following statements below is true about supervised/unsupervised machine learning?

Answer: Supervised learning require labeled data for training

Question: Which of the following is an example of unsupervised machine learning?

Answer: Clustering

Question: Artificial Intelligence _______

Answer: Is a broad science of mimicking human abilities

Question: AI is not embraced everywhere in every industry because _______.

Answer: It can be operationally expensive

Question: In developing spam filter algorithms, we need

Answer: Labeled data of both spam and non-spam emails

