MC1630 Data Warehousing & Mining MCA Question Bank : niceindia.com
Name of the College : Noorul Islam College of Engineering
University : Anna University
Degree : MCA
Department : Computer Applications
Subject Code/Name : MC 1630 – Data Warehousing & Data Mining
Document Type : Question Bank
Website : niceindia.com
Download Model/Sample Question Paper : https://www.pdfquestion.in/uploads/niceindia.com/3083-MC1630_-_DATA_WAREHOUSING_AND_DATA_MINING.pdf
NICE Data Warehousing & Data Mining Question Paper
1. How Accuracy is an important factor in assessing the success of data mining? :
Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied to models, accuracy refers to the degree of fit between the model and the data. This measures how error-free the model’s predictions are. Since accuracy does not include cost information, it is possible for a less accurate model to be more cost-effective.
Related : Noorul Islam College of Engineering MC1703 Software Engineering MCA Question Bank : www.pdfquestion.in/3081.html
2. What is called Antecedent? :
When an association between two variables is defined, the first item (or left-hand side) is called the antecedent. For example, in the relationship “When a prospector buys a pick, he buys a shovel 14% of the time,” “buys a pick” is the antecedent.
3. Describe the association algorithm
An association algorithm creates rules that describe how often events have occurred together. For example, “When prospectors buy picks, they also buy shovels 14% of the time.” Such relationships are typically expressed with a confidence interval.
4. What is Binning? :
A data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For example, age could be converted to bins such as 20 or under, 21-40, 41-65 and over 65.
5. Describe the chi-squared
A statistic that assesses how well a model fits the data. In data mining, it is most commonly used to find homogeneous subsets for fitting categorical trees as in CHAID.
6. What is CHAID? :
An algorithm for fitting categorical trees. It relies on the chi-squared statistic to split the data into small-connected data sets.
7. What are the role of Classification of data mining? :
Refers to the data mining problem of attempting to predict the category of categorical data by building a model based on some predictor variables.
8. What is a classification tree? :
A decision tree that places categorical variables into classes.
9. What is called cleaning (cleansing)? :
Refers to a step in preparing data for a data mining activity. Obvious data errors are detected and corrected (e.g., improbable dates) and missing data is replaced.
10. What is confidence? :
Confidence of rule “B given A” is a measure of how much more likely it is that B occurs when A has occurred. It is expressed as a percentage; with 100% meaning B always occurs if A has occurred. Statisticians refer to this as the conditional probability of B given A. When used with association rules, the term confidence is observational rather than predictive. (Statisticians also use this term in an unrelated way. There are ways to estimate an interval and the probability that the interval contains the true value of a parameter is called the interval confidence. So a 95% confidence interval for the mean has a probability of .95 of covering the true value of the mean.)
11. What is called consequent? :
When an association between two variables is defined, the second item (or right-hand side) is called the consequent. For example, in the relationship “When a prospector buys a pick, he buys a shovel 14% of the time,” “buys a shovel” is the consequent.
12. What is called a continuous data? :
Continuous data can have any value in an interval of real numbers. That is, the value does not have to be an integer. Continuous is the opposite of discrete or categorical.
13. What is called cross validation? :
A method of estimating the accuracy of a classification or regression model. The data set is divided into several parts, with each part in turn used to test a model fitted to the remaining parts.
14. Define data.
Values collected through record keeping or by polling, observing, or measuring, typically organized for analysis or decision making. More simply, data is facts, transactions and figures.
15. Define data format.
Data items can exist in many formats such as text, integer and floating-point decimal. Data format refers to the form of the data in the database.
16. Define Data mining.
An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.
17. What data mining method? :
Procedures and algorithms designed to analyze the data in databases.
18. what do you mean by degree of fit? :
A measure of how closely the model fits the training data. A common measure is r-square.
19. What is called dependent variable? :
The dependent variables (outputs or responses) of a model are the variables predicted by the equation or rules of the model using the independent variables (inputs or predictors).
20. What is called discriminant analysis? :
A statistical method based on maximum likelihood for determining boundaries that separate the data into categories.