X

Data Warehousing & Data Mining M.C.A Question Paper : vardhaman.org

College : Vardhaman College Of Engineering
Degree : M.C.A
Semester : V
Subject : Data Warehousing & Data Mining
Document type : Question Paper
Website : vardhaman.org

Download Previous / Old Question Papers : https://www.pdfquestion.in/uploads/vardhaman.org/6475-MA5R11DEC-13.pdf

Vardhaman Data Warehousing Question Paper

MCA V Semester Regular Examinations, December – 2013
(Master of Computer Applications)
Date : 3 December, 2013
Time : 3 Hours
Max. Marks : 60

** Answer any FIVE Questions.
** All Questions carry equal marks

Related : Vardhaman College Of Engineering Advertising & Brand Management M.B.A Question Paper : www.pdfquestion.in/6474.html

Instructions

All parts of the questions must be answered in one place only

1. a) The entity-relationship data model is commonly used in the design of relational databases, where a database scheme consists of a set of entities and the relationships between them. Explain the different types of schemas for multidimensional data models with an example for each schema. 8M

b) A data cube is a lattice of cuboids. Suppose that you would like to create a data cube for All Electronics sales that contains the following: city, item, year, and sales in dollars. You would like to be able to analyze the data, with queries such as the following:
“Compute the sum of sales, grouping by city and item.”
“Compute the sum of sales, grouping by city.”
“Compute the sum of sales, grouping by item.”
What is the total number of cuboids, or group-by’s that can be computed for this data cube? Assume necessary data. 4M

Sample Questions

2. a) What are discrepancies in data sets? Explain the different tools used for discrepancy detection. 6M
b) Discuss whether or not each of the following activities is a data mining task.
i. Monitoring seismic waves for earthquake activities.
ii. Extracting the frequencies of a sound wave.
iii. Predicting the outcomes of tossing a (fair) pair of dice. 6M

3. a) Robust data loading poses a challenge in database systems because the input data are often dirty. In many cases, an input record may have several missing values and some records could be contaminated. Work out an automated data cleaning and loading algorithm so that the erroneous data will be marked and contaminated data will not be mistakenly inserted into the database during data loading. 8M
b) Discuss the activities involved in Data Transformation. 4M

4. a) Let game refer to the transactions containing computer games, and video refer to those containing videos. Of the 10,000 transactions analyzed, the data shows that 6000 of the customer transactions included computer games while 7500 included videos and 4000 included both computer and video games.
i. Construct the contingency table and compute the support and confidence for the given rule : buys(X, games) => buys(X, video)
ii. Prove that all strong rules are not necessarily interesting if the min_support=35% and min_confidence=60% Use lift measure to find the correlation between games and videos. 8M

b) Discuss the various factors that affect the computational complexity of Apriori Algorithm. 4M

5. a) Why naïve Bayesian classification is called “naïve”? Briefly outline the major ideas of naïve Bayesian classification. 6M
b) It is difficult to assess classification accuracy when individual data objects may belong to more than one class at a time. In such cases, comment on what criteria you would use to compare different classifiers modeled after the same data. 6M

6. a) Why it is that BIRCH encounters difficulties in finding clusters of arbitrary shape but OPTICS does not? Can you propose some modifications to BIRCH to help it find clusters of arbitrary shape? 6M
b) Why is outlier mining important? Briefly describe the different approaches behind statistical-based outlier detection, distanced-based outlier detection, and deviation-based outlier detection. 6M

7. a) The concept of microclustering has been popular for on-line maintenance of clustering information for data streams. By exploring the power of microclustering, design an effective density-based clustering method for clustering evolving data streams. 6M
b) Tremendous and potentially infinite volumes of data streams are often generated by real-time surveillance systems, communication networks and other dynamic environments. Elaborate different types of mining data streams with an example. 6M

8. a) A heterogeneous database system consists of multiple database systems that are defined independently, but that need to exchange and transform information among themselves and answer local and global queries. Discuss how to process a descriptive mining query in such a system using a generalization-based approach. 6M

b) Spatial association mining can be implemented in at least two ways: (i) Dynamic computation of spatial association relationships among different spatial objects, based on the mining query, and (ii) Precomputation of spatial distances between spatial objects, where the association mining is based on such precomputed results. Discuss how to implement each approach efficiently.

Bensiga:
www.pdfquestion.in © 2022 Contact Us   Privacy Policy   Site Map