Name of the College : Mahatma Gandhi University
Department : Computer Science and Engineering
Subject Code/Name : MCSIS 106-1/Data Mining And Knowledge Discovery
Sem : I
Website : mgu.ac.in
Document Type : Model Question Paper
Download Model/Sample Question Paper :
I : https://www.pdfquestion.in/uploads/mgu.ac.in/5024-1-MCSIS%20106-1%20DMKD-Set%201.doc
II : https://www.pdfquestion.in/uploads/mgu.ac.in/5024-2-MCSIS%20106-1%20DMKD-Set%202.doc
Data Mining & Knowledge Discovery Question :
M.TECH. Degree Examination, December 2013 :
Branch: Computer Science Engineering
Specialization: Information Systems
Related : MGU MCSIS105-4 Information Theory & Coding M.Tech Model Question Paper : www.pdfquestion.in/5023.html
Model Question Paper – I
First Semester
MCSIS 106-1 : Data Mining And Knowledge Discovery
(Regular – 2013 Admissions)
Time: Three Hours
Maximum: 100 marks
Answer all questions. :
Any data, if required may be suitably assumed and clearly indicated. :
1. a. What is knowledge discovery? What is the role of data mining in the process of Knowledge discovery? (11 marks)
b. What are the major tasks in Data preprocessing? (7 marks)
c. Explain concept hierarchy generation .With a suitable example show how is it done for categorical data. (7 marks)
OR
2. a. What is data reduction? What are the different data reduction strategies? (10 marks)
b. What are the desirable properties of discovered knowledge? (8 marks)
c. Explain the different tasks in data cleaning. (7 marks)
3. A database has five transactions. Let min sup = 60% and min con f = 80%.
(a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the Efficiency of the two mining processes. (15 marks)
(b) List all of the strong association rules (with support s and confidence c) matching the following metarule, where X is a variable representing customers, and itemi denotes variables representing items (e.g., “A”, “B”, etc.): (10 marks)
OR
4. a. Explain task specific population initialization and seeding. (10 marks)
b. Explain constraint based association mining. (10 marks)
c. What is fitness function and evaluation. (5 marks)
5. a. What is prediction ? What are the issues related to classification and prediction? (10 marks)
b. Write the algorithm for classification by decision tree induction . (8 marks)
c. What is Bayes Theorem? Explain Naïve Bayes Classification. (7 marks)
OR
6. a. Explain rule based classification with an example. (10 marks)
b.Explain linear regression and non linear regression. (10 marks)
c. Define the terms sensitivity, specificity, accuracy, precision and write their equations. (5 marks)
7. a. What is cluster analysis? What are its desired features? (11 marks)
b. Write a brief note on various cluster analysis methods. (8 marks)
c. Explain clustering high dimensional data. (7 marks)
OR
8. Suppose that a data mining task is to cluster the following eight points (with (x,y) representing locations) into three clusters.A1(4,6) , A2 (2,5) ,A3(9,3), A4(6,9), A5( 7,5), A6(5,7), A7(2,2), A8(6,6). Suppose initially we assign A1, A2 and A3 as the seeds of three clusters that we wish to find. Use the K-means method to show:
a. The three cluster centroids after the first iteration using the Manhattan distance. (10 marks)
b. The final three clusters. (15 marks)
MCSIS 106-1
Data Mining And Knowledge Discovery :
(Regular – 2013 Admissions)
Time: Three Hours
Maximum: 100 marks
Answer all questions. :
Any data, if required may be suitably assumed and clearly indicated. :
1. a. Explain the different data mining functionalities. (11 marks)
b. What are the major issues in data mining? (8 marks)
c. What is binning? With a suitable example show the various binning methods for data smoothing. (6 marks)
OR
2. a. What is data compression? What are the different methods for that? (8 marks)
b. What is data integration? What are the different types of data integration? (7 marks)
c. Explain concept hierarchy generation .With a suitable example show how is it done for categorical data. (10 marks)
3. A database has four transactions. Let min sup = 60% and min con f = 80%.
a. At the granularity of item category (e.g., itemi could be “Milk”), for the following rule template,
List the frequent k-itemset for the largest k, and all of the strong association rules (with their support s and confidence c) containing the frequent k-itemset for the largest k. (15 marks)
b. At the granularity of brand-item category (e.g., itemi could be “Sunset-Milk”), for the following rule template list the frequent k-itemset for the largest k (but do not print any rules). (10 marks)
OR
4. a.Briefly Explain selection, Crossover and mutation in Genetic Algorithms (15 marks)
b. Explain correlation analysis with an example (10 marks)
5. a. Explain bagging and boosting . Write the algorithm for adaboost. (10 marks)
b. What is prediction ? What are the issues related to classification and prediction? (10 marks)
c. How is classification done using Support vector machines. (5 marks)
OR
6. a. Explain with an example how classification is done using decision tree induction. (15 marks)
b. What is Bayes Theorem? Explain Bayesian Belief networks with a suitable example. (10 marks)
7. a. Explain density based clustering methods. (10 marks)
b. Explain grid based clustering method. (9 marks)
c. Explain clustering high dimensional data. (6 marks)
OR
8. The following 6 objects ,each with two attributes are to be clustered: A1(4,6) , A2 (2,5), A3(9,3), A4(6,9), A5( 7,5), A6(5,7)
a. Show the distance matrix for the 6 objects using the Manhattan distance. (5 marks)
b. Using the divisive method determine the two objects that should form the basis for splitting the above dataset. (10 marks)
c. Now split the dataset using the two objects identified in part(b) using the k- means method. (10 marks)
View Comments (1)
Answers
4. a. Explain task specific population initialization and seeding. (10 marks)
b. Explain constraint based association mining. (10 marks)
c. What is fitness function and evaluation. (5 marks)