Wpc 300 Final Exam
Question: In an agile approach of analytics what is the first step of the process?
Answer: Perform business discovery
Question: In an ETL process, data is loaded into a final target database such as:
Answer: Data warehouse
Question: What are the four types of data analytical method?
Answer: Descriptive, explanatory, predictive and prescriptive
Question: Which of the following is an example of secondary data?
Answer: Firm's proprietary data
Question: Which of the following data analysis models use optimization techniques?
Answer: Prescriptive analytics
Question: Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance.
Answer: Prescriptive analytics
Question: Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using?
Answer: Descriptive analytics
Question: Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using?
Answer: Predictive analytics
Question: Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this?
Answer: Prescriptive analytics
Question: Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion?
Answer: All of the answer selections are correct.
Question: Deleting the grid lines in a chart
Answer: Increases the data-ink ratio
Question: When the lie-factor of a graphical chart is more than 1,
Answer: the size of the effect shown in the graph is bigger than the actual effect in the data.
Question: Which are useful principles for data visualization?
Answer: The graph suggests a possible true effect
Question: Which of the following statement(s) about charts is false?
Answer: None of the other answers are false
Question: In order for a chart to have graphical integrity, the lie factor must be:
Answer: close to 1
Question: What best describes the nature of a rose diagram?
Answer: Plots data using a circular historical plot
Question: In for a chart to minimize graphical complexity, the data-ink ratio must be:
Answer: close to 1
Question: Which of the following violates the principle of data visualization?
Answer: The data-ink ratio should be higher than 1
Question: Which of the following statement(s) about charts is true?
Answer: Data ink can sometimes help tell a richer story
Question: Which of the following statements is a reason not to use a table for data visualization?
Answer: Tables cannot easily show trends
Question: Standard deviation of a normal data distribution is a _______.
Answer: measure of data dispersion
Question: The difference between the first and third quartiles is referred to as the ____________.
Answer: interquartile range
Question: Which of the following is an example of a sample?
Answer: The number of IT employees out of all employees working in an office of Google
Question: Which of the following is an example of a measure of dispersion?
Answer: variance
Question: Which of the following describes the standard deviation?
Answer: It is the square root of the variance.
Question: The ________ is the observation that occurs most frequently.
Answer: mode
Question: For a normal distribution mean is _______ to median.
Answer: equal
Question: Which of the following describes a positively skewed histogram?
Answer: a histogram that tails off towards the right
Question: What are the three principles of describing data?
Answer: Center, spread and shape
Question: Which of the following is true for a median?
Answer: For an even number of observations, the median is the mean of the two middle numbers
Question: Which of the following is a difference between the t-distribution and the standard normal (z) distribution?
Answer: The t-distribution has a larger variance than the standard normal distribution
Question: What is the confidence level when the level of significance is 0.07?
Answer: 0.93
Question: The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean?
Answer: 3
Question: In order to reject the null hypothesis, the p-value must be less than the
Answer: Alpha
Question: You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias?
Answer: Anonymously data collection by hiding ASU brand in the survey question.
Question: When sample size increases
Answer: Confidence interval decreases
Question: Which of the following is a continuous random variable?
Answer: The time to complete a specific task
Question: Which of the following is a Type-I error?
Answer: The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.
Question: Which of the following proposition describes an existing theory or belief?
Answer: Null hypothesis
Question: The central limit theorem states that even if the population is not normally distributed, the
Answer: distribution of the sample mean will still be normal when the sample size is large
Question: A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be:
Answer: 6900
Question: Which of the following is true about multi-collinearity?
Answer: It is measured using a measure called variance inflation factor (VIF).
Question: Which of the following assumptions is not true for multiple linear regression?
Answer: There will be a multi-collinearity effect.
Question: A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that:
Answer: They should hire a new statistician.
Question: The value of R-Squared always falls between ________ and ________, inclusive.
Answer: 0 and 1
Question: A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________.
Answer: an independent variable
Question: The unexplained variance in the regression analysis is also known as:
Answer: Residual variance
Question: What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable?
Answer: There is no linear relationship between profit and sales.
Question: Which of the following statement is true based on the following regression equation?IQ = 4.0 + Reading Label * 5.6
Answer: A unit point change in reading label will increase IQ by 5.6 point.
Question: The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true?
Answer: 81% of the variation in the money spent on repairs is explained by the age of the vehicle
Question: A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?
Answer: By utilizing a multiple logistic regression model developed by an in-house analyst
Question: In classification analysis, we are determining the probability of an observation ________.
Answer: To be part of a certain class or not
Question: The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.
Answer: Confusion matrix
Question: In logistic regression, the dependent variable y is defined as:
Answer: Log (p/1-p)
Question: If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?
Answer: Multiple logistic regression
Question: In classification problems, the primary source for accuracy estimation of the model is ________.
Answer: Confusion matrix
Question: In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.
Answer: Logit
Question: Odds ratio is defined as ________, where p is the probability of success.
Answer: p/1-p
Question: Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.
Answer: a binary categorical
Question: In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.
Answer: Training and validation/testing
Question: Which of the following is a definition of distance between two clusters in a complete linkage clustering?
Answer: The distance between the most distant pair of objects, one from each group
Question: Which of the following is true of hierarchical clustering?
Answer: The data partition does not occur in a single step
Question: Which of the following is not an application of clustering analysis?
Answer: Crime prediction analysis
Question: Which of the following is true about k-means clustering
Answer: We choose the value for k before doing the clustering analysis
Question: Which of the following is a false statement?
Answer: To predict sales from transactional data one should perform clustering analysis.
Question: In a cluster analysis, the distance between the clusters should be:
Answer: Maximized
Question: Which of the following is a step of agglomerative hierarchical clustering?
Answer: By joining two clusters that are closest to each other
Question: Which of the following statements below is false about supervised/unsupervised data analysis?
Answer: Data is not labeled for supervised analysis
Question: In the Target story discussed in the lecture, why did Target send the teen daughter maternity ads?
Answer: Target analytics model suggested she was pregnant based on her buying habit
Question: Which of the following category of data mining you would use for Spam filtering of emails?
Answer: Supervised
Question: Which of the following is not a component of the relational database?
Answer: Analysis
Question: Which of the following is a cloud service provider?
Answer: VMWare
Question: When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"?
Answer: Traveler
Question: When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______.
Answer: INNER JOIN
Question: You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table?
Answer: Airport Code
Question: The SQL code to extract only first_name information for all records of the "Actor" table below is:
Answer: SELECT first_name FROM Actor;
Question: _______ ensures that related data exist in parent table before allowing an entry into a child table.
Answer: Referential integrity
Question: "Google Doc" is an example of _______ in a could computing environment.
Answer: SaaS
Question: Which of the following tools help in periodic managerial decision-making?
Answer: OLAP
Question: Which of the following is an important task of a database management system?
Answer: Provides support such as performing maintenance and routine backups.
Question: Which of the following is not a requirement for an ETL architecture?
Answer: data quality
Question: Which of the following is not one of the processes involved in data cleaning?
Answer: Encrypting
Question: Extract function in ETL reads data from
Answer: specified source database
Question: In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______.
Answer: Data warehouse
Question: Which of the following is not a standard practice in "Data Transformation" process of an ETL tool?
Answer: Data extraction from ERP
Question: Which of the following is an ETL vendor?
Answer: Teradata
Question: One of the processes in ETL is
Answer: Load
Question: Data transformation involves
Answer: data splitting and aggregation
Question: The final stage of an ETL process is:
Answer: Load
Question: In data extraction process for an ETL tool, which of the following is not an example of legit data source?
Answer: Competitions' data
Question: A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not.
Answer: spurious correlation
Question: Which of the following is true about A/B testing?
Answer: To increase conversion rate of your website traffic, A/B testing can be beneficial.
Question: After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________.
Answer: Observational Study
Question: A/B testing can help marketers to
Answer: All of the answers are correct
Question: An experiment is said to be double-blinded if _________
Answer: neither the subject nor those working with the subject is aware of who is being given which treatment
Question: Which of the following statements is NOT true about experimental studies to compare two treatments?
Answer: It is not easy to control uncertainties in the comparison..
Question: Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be
Answer: Money
Question: A sample study is mostly done
Answer: to estimate the parameters of the population.
Question: In the experimental design example "IQ Water", students are called _______.
Answer: experimental units
Question: The first step for any kind of A/B testing is
Answer: to develop a test plan for what you want to test.
Question: Over-reliant on the first piece of information is called ____________
Answer: Anchoring bias
Question: Gamblers' fallacy is ____________.
Answer: a clustering illusion
Question: When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________.
Answer: sunk-cost fallacy
Question: Which of the following statements is true?Analytical thinking is not based on factsHeuristic thinking is slowUsing intuition is a way of analytical thinkingExperimentation is a way of analytical thinking
Answer: Experimentation is a way of analytical thinking
Question: Which of the following biases cannot be categorized as a cognitive bias?
Answer: None of the answer selections are correct
Question: A person who is convinced he is gaining admission to Harvard by merely applying is suffering from:
Answer: Overconfidence
Question: When you buy a new car, you value it more than the price you paid because of:
Answer: Endowment effect bias
Question: Which of the following is not a drawback of analytical decision making?
Answer: None of the answer selections are correct
Question: You bought a top of the line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here?
Answer: Bandwagon effect
Question: What kinds of bias could show up when collecting data?
Answer: All of the answer selections are correct
Question: Which of the following statements is not true about artificial neural networks
Answer: In the hidden layer of the networks, input data is hidden
Question: Which of the following is an example of association rule learning?
Answer: How frequently an item set occurs in a transaction
Question: An ideal machine learning process needs
Answer: All other answer are true.
Question: Which of the following examples is not an application of AI?
Answer: Predicting the exam score by scanning the appropriate text book
Question: Which of the following techniques is a modern update of artificial neural networks?
Answer: Deep learning
Question: Which of the following statements below is true about supervised/unsupervised machine learning?
Answer: Supervised learning require labeled data for training
Question: Which of the following is an example of unsupervised machine learning?
Answer: Clustering
Question: Artificial Intelligence _______
Answer: Is a broad science of mimicking human abilities
Question: AI is not embraced everywhere in every industry because _______.
Answer: It can be operationally expensive
Question: In developing spam filter algorithms, we need
Answer: Labeled data of both spam and non-spam emails