Back to Module List

CMPC3M03 - INFORMATION RETRIEVAL

Module Code:
CMPC3M03
Department:
Computing Sciences
Credit Value:
20
Level:
3
Organiser:
Mr. Stephen Cox
The module explores the development of Information Retrieval technologies, which have been driven by large increases in on-line documents and the Internet search engines. The main topics covered include information retrieval models and architecture, Web-based retrieval, multimedia retrieval, common NLP techniques and their role in IR. Previous experience of a high level programming language is required, and either CMPS2B23 or CMPS2B26 are desirable pre-requisites.

Lecture notes and materials will be made available via Blackboard; references to books, papers and other on-line resources are given in lectures.


Course Text:

Manning C. D., Raghavan P. and Schutze H. (2008) Introduction to Information Retrieval Cambridge University Press. ISBN-10: 0521865719
(There is a companion website and preliminary version of this book available).

Other textbooks:

  • Levene, M. (2006)  An Introduction to Search Engines and Web Navigation Addison Wesley, ISBN: 0-321-30677-5
  • Belew, R. K.(2000)  Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW, CUP, ISBN: 0-521-63028-2

Submission:

Written coursework should be submitted by following the standard CMP practice. Students are advised to refer to the Guidelines and Hints on Written Work in CMP.

Deadlines:

If coursework is handed in after the deadline day or an agreed extension:
 

 

Work submitted Marks deducted
After 15:00 on the due date and before 15:00 on the day following the due date 10 marks
After 15:00 on the second day after the due date and before 15:00 on the third day after the due date 20 marks
After 15:00 on the third day after the due date and before 15:00 on the 20th day after the due date.  All the marks the work merits if submitted on time (ie no marks awarded) 
After 20 working days Work will not be marked and a mark of zero will be entered


Saturdays and Sundays will NOT be taken into account for the purposes of calculation of marks deducted.

All extension requests will be managed through the LTS Hub. A request for an extension to a deadline for the submission of work for assessment should be submitted by the student to the appropriate Learning and Teaching Service Hub, prior to the deadline, on a University Extension Request Form accompanied by appropriate evidence. Extension requests will be considered by the appropriate Learning and Teaching Service Manager in those instances where (a) acceptable extenuating circumstances exist and (b) the request is submitted before the deadline. All other cases will be considered by a Coursework Coordinator in CMP.

For more details, including how to apply for an extension due to extenuating circumstances download Submission for Work Assessment (PDF, 39KB)
 

Plagiarism:

Plagiarism is the copying or close paraphrasing of published or unpublished work, including    the work of another student; without due acknowledgement. Plagiarism is regarded a serious offence by the University, and all cases will be investigated. Possible consequences of plagiarism include deduction of marks and disciplinary action, as detailed by UEA's Policy on Plagiarism and Collusion.


Module specific:

  • To introduce the classic vector-based and probabilistic models of information retrieval
  • To survey some common techniques for improving information retrieval (e.g. relevance feedback, link analysis algorithms, recommender systems)
  • To introduce natural language processing techniques that are used to improve the performance of information retrieval systems
  • To describe the main issues in engineering scalable information retrieval systems
  • To understand the main issues and measures used in evaluating the performance of information retrieval and document classification systems
  • To give students practical experience of information retrieval systems, experimental work and evaluation
  • To describe some of the principal approaches and issues in retrieving non-text information e.g. music and video
  • To introduce more specialised topics related to information retrieval and natural language processing (e.g. information extraction, document classification)

Module specific:

On completion of this module students should acheieve the following:

  • Understanding and experience of a range of information retrieval techniques and models and their application, particularly in Web searching
  • Understanding of how natural language processing techniques may be applied to information retrieval
  • Appreciation of the main issues and techniques in multimedia information retrieval
  • Appreciation of the achievements and limitations of current information retrieval approaches

Transferable skills:

On completion of this module students should acheieve the following:

  • Improved research and communications skills
  • Improved programming skills
  • Further experience of report-writing


 


Total hours: 40

Lectures: 20 hours (with provisional weekly schedule)

  1. The IR process
  2. Indexes and terms
  3. Vector space modelling
  4. Probabilistic models of IR
  5. Relevance, feedback, evaluation
  6. Text classification
  7. Web search
  8. Phrases, parts of speech and stemming
  9. Language modelling, parsing
  10. Audio and music retrieval
  11. Image and video retrieval

 Workshops: 0 hours

Laboratory work: 20 hours (with provisional weekly schedule)

  1. MATLAB programming (if required)
  2. Indexing for IR systems
  3. Document ranking with the vector space model
  4. Probabilisitc document ranking
  5. Evaluation
  6. N-gram language modelling
  7. Other topics depending on the coursework topic

Examination with Coursework or Project