Automatic Document Summarisation – Web-based AI Programming

 

Document summarization is the technique of identifying and extracting important information from text documents. The output of the document summarization is usually significantly smaller than original document and is not longer than half of the original document under any circumstances. In this assignment you are required to do the following tasks:

1. Summarization Algorithms: Discuss at least three document summarization technique.

2. Implementation of summarization algorithm: Implement one of the document summarization techniques using Perl, Java or Python. Optionally you can use automatic summarization tool such as Mead [http://www.summarization.com/mead/]

3. Results: Rate the summarization of text produced by program/tool. Present the summarization
results.

Report Guideline:

A “standard” experimental AI paper consists of the following sections:

1. Introduction
Motivate and abstractly describe the problem you are addressing and how you are addressing it. What is the
problem? Why is it important? What is your basic approach? A short discussion of how it fits into related
work in the area is also desirable. Summarize the basic results and conclusions that you will present.
2. Problem Definition and Algorithm
2.1 Task Definition
Precisely define the problem you are addressing (i.e. formally specify the inputs and outputs). Elaborate on
why this is an interesting and important problem. Include a simple specific example, providing the I/O
showing how the output is related to the input specifying the desired/achieved properties of the output
illustrating the basic terms used.

 

2.2 Algorithm Definition
Describe in reasonable detail the algorithm (rules) you are using to address this problem. A pseudo-code
description of the algorithm you are using is frequently useful. Trace through a concrete example, showing
how your algorithm processes this example. The example should be complex enough to illustrate all of the
important aspects of the problem but simple enough to be easily understood. If possible, an intuitively
meaningful example is better than one with meaningless symbols.

3. Experimental Evaluation
3.1 Methodology
What are criteria you are using to evaluate your method? What specific hypotheses does your experiment
test? Describe the experimental methodology that you used. What are the dependent and independent
variables? What is the training/test data that was used, and why is it realistic or interesting? Exactly what
performance data did you collect and how are you presenting and analyzing it? Comparisons to competing
methods that address the same problem are particularly useful.
3.2 Results
Present the quantitative results of your experiments. Graphical data presentation such as graphs and
histograms are frequently better than tables. What are the basic differences revealed in the data? Are they
statistically significant?
3.3 Discussion
Is your hypothesis supported? What conclusions do the results support about the strengths and weaknesses
of your method compared to other methods? How can the results, be explained in terms of the underlying
properties of the algorithm and/or the data.

4. Related Work
Answer the following questions for each piece of related work that addresses the same or a similar problem.
What is their problem and method? How is your problem and method different? Why is your problem and
method better?
5. Future Work
What are the major shortcomings of your current method? For each shortcoming, propose additions or
enhancements that would help overcome it.
6. Conclusion
Briefly summarize the important results and conclusions presented in the paper. What are the most
important points illustrated by your work? How will your results improve future research and applications
in the area?
Bibliography & Citations
Be sure to include a standard, well-formatted, comprehensive bibliography with citations from the text
referring to previously published papers in the scientific literature that you utilized or are related to your
work