Back to Module List

CMPSMC24 - DATA MINING

Module Code:
CMPSMC24
Department:
Computing Sciences
Credit Value:
20
Level:
M
Organiser:
Dr. Beatriz De La Iglesia
This module is designed for postgraduate students studying on MSc courses. The module explores the methodologies of Knowledge Discovery and Data Mining (KDD). It aims to cover each stage of the KDD process, including data gathering, preliminary data anaylsis or data exploration, data cleansing, pre-processing and the various data analysis tasks that fall under the heading of data mining. Through this module, students should gain knowledge of algorithms and methods for each stage of the process, as well as practical experience using leading KDD software packages throughout all of the stages of KDD process.

Course notes will be distributed during lectures and also made available on the Blackboard site for this module.

Students will have to use the packages installed in the CMP labs for their practical work.  Some of these packages (not all) are available for installation in students' own machines.  Laboratory work will take place during timetabled laboratory periods. The coursework may require students to spend time working in the School's laboratories outside the timetabled hours.


The main library catalogue currently lists a few data mining books, they are shelved at XD1353.
 

Recommended reading:

Dunham, M.H. (2003) Data Mining Introductory and Advanced Topics, Prentice Hall.

Other relevant textbooks:

  • Pyle, P. (1999) Data Preparation for Data Mining, Morgan Kaufmann Publishers Inc., San Francisco
  • Witten, I.H. and Frank, E.(2000) Data Mining, Morgan Kaufmann Publishers Inc., San Francisco
  • Jan, J. and Kamber,M.(2001) Data Mining Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco
  • Tan, P.N., Steinbach, M. and Kumar,V.(2006) Introduction to Data Mining, Addison Wesley, Boston

Also the proceedings from the International Conference on Knowledge Discovery in Databases (1995-200) have a wealth of relevant papers.
 

Web-based material:


The kdnuggets.com page is probably one of the most informative and up-to-date pages for KDD.


Submission:

Written coursework should be submitted by following the standard CMP practice. Students are advised to refer to the Guidelines and Hints on Written Work in CMP.

Deadlines:

If coursework is handed in after the deadline day or an agreed extension:

Work submitted Marks deducted
After 15:00 on the due date and before 15:00 on the day following the due date 10 marks
After 15:00 on the second day after the due date and before 15:00 on the third day after the due date 20 marks
After 15:00 on the third day after the due date and before 15:00 on the 20th day after the due date.  All the marks the work merits if submitted on time (ie no marks awarded) 
After 20 working days Work will not be marked and a mark of zero will be entered


Saturdays and Sundays will NOT be taken into account for the purposes of calculation of marks deducted.

All extension requests will be managed through the LTS Hub. A request for an extension to a deadline for the submission of work for assessment should be submitted by the student to the appropriate Learning and Teaching Service Hub, prior to the deadline, on a University Extension Request Form accompanied by appropriate evidence. Extension requests will be considered by the appropriate Learning and Teaching Service Manager in those instances where (a) acceptable extenuating circumstances exist and (b) the request is submitted before the deadline. All other cases will be considered by a Coursework Coordinator in CMP.

For more details, including how to apply for an extension due to extenuating circumstances download Submission for Work Assessment (PDF, 39KB)
 

Plagiarism:

Plagiarism is the copying or close paraphrasing of published or unpublished work, including    the work of another student; without due acknowledgement. Plagiarism is regarded a serious offence by the University, and all cases will be investigated. Possible consequences of plagiarism include deduction of marks and disciplinary action, as detailed by UEA's Policy on Plagiarism and Collusion.


Module specific:

  • To obtain an overall view of the complex process of Knowledge Discovery (KDD) and Data Mining and understand the need for a methodical approach to KDD
  • To explore and review tools and algorithms available to each stage of the KDD process
  • To gain experience of using KDD software tools in medium to large databases
  • To learn to evaluate the suitability of software tools in the context of different data analysis tasks
  • To learn to combine data manipulation and analysis approaches in order to improve the quality of input data
  • To present knowledge induced in a format suitable for the target audience and for a particular application
  • To perform cost/benefit analysis of any discovered knowledge so that the outcome of a KDD project can be "sold" successfully

Transferable skills:

  • To present findings to technical and non technical audiences using appropriate methods in each case
  • To gain further experience in IT skills by the use of different packages running on different operating systems and platforms
  • To learn how to search for relevant reference material using all available sources of information, and particularly the Internet
  • To practice problem solving using a methodical approach

On completion of this module students should have achieved the following skills:

Module specific:

  • Understanding of the complex process of KDD and Data Mining and the need for a methodical approach to KDD.
  • Critically evaluation of tools and algorithms available to each stage of the KDD process.
  • Competence in using KDD software tools in medium to large databases.
  • Competence in applying relevant techniques at each stage of the KDD process
  • Ability to evaluate the suitability of software tools in the context of different data analysis tasks.
  • Competence in combining data manipulation and analysis approaches in order to improve the quality of input data.
  • Understanding and identification of problems in input data such as outliers, missing data, unreliable data, differences in granularity, and others, and identify an adequate strategy to deal with the problem data.
  • Presentation of knowledge induced in a format suitable for the target audience and for the particular application.
  • Perform cost/benefit analysis of any discovered knowledge so that the outcome of a KDD project can be "sold" successfully.

This module is delivered as a programme of lectures (22 hours) and laboratory classes (18 hours).

Total hours: 40

Lectures: 22 hours (with provisional weekly schedule)

  1. Introduction to KDD: concepts, definitions and applications

  2. The KDD Roadmap dissected
  3. Initial stages: data warehousing, data marts, OLAP
  4. Data cleansing: missing data, outlier handling, balancing, sampling
  5. Data Pre-processing: feature subset selection, feature construction, discretisation, Principal component analysis
  6. Data mining: clustering
  7. Data mining: classification using decision trees
  8. Data mining: classification using Neural Nets
  9. Data mining: partial classification and association rules
  10. Text mining
  11. Case studies.

Workshops: 0 hours

Laboratory classes: 18 hours (with provisional weekly schedule)

  1. Clementine tutorial
  2. Using Clementine - basic features
  3. Using Clementine - advanced features
  4. KnowledgeSeeker - tutorial
  5. Using KnowledgeSeeker - basic features
  6. Using KnowledgeSeeker - advanced features
  7. DataLamp tutorial
  8. Using DataLamp: simple features
  9. Using DataLamp: advanced features.

Coursework