Skip to content

Language Model project is a Java-based language and N-Gram model. It predicts up to two words based on a single word input and provides detailed text analysis statistics. Demonstrating advanced object-oriented programming and design principles, it is a valuable tool for predictive text input and linguistic analysis.

License

Notifications You must be signed in to change notification settings

Vivek-Tate/Language-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Model Project

Overview

This project is a GUI-based application for language model analysis, designed with object-oriented programming principles for efficient document processing, vocabulary management, and language model prediction.

Features

Main Screen

  • Document Upload: Load and display a document.
  • Processing Document: Enable processing with selected hash functions.
  • Vocabulary Display: Show and sort word counts.
  • Statistics Display: View detailed statistics.
  • Language Model Analysis: Access advanced analysis features.

Language Model Analysis Screen

  • Prediction: Predict the next 20 words based on input.
  • Statistics: View model statistics.
  • Navigation: Return to the Main Screen retaining data.

Key Concepts

Data Abstraction and Encapsulation

  • Interfaces: For HashFunction and LinkedListObject.
  • Encapsulation: Private variables with public methods.
  • Inheritance: Hierarchical structure for HashFunction implementations.
  • Polymorphism: Interchangeable hash functions in MyHashTable.

Tasks

MyLinkedObject

  • Constructor: Initializes with word and count.
  • Methods: setWord(String w), getWordCount().

MyHashFunction

  • First Letter Hash: Uses Unicode value of the first letter.
  • Division Hash: Sums Unicode values of all characters.

MyHashTable

  • Encapsulation: Hides internal classes.

Vocabulary List

  • Sorting: By descending frequency.
  • Statistics: Unique words and word counts.

N-grams

  • Unigrams, Bigrams, Trigrams: Probability calculations.
  • Predictions: Based on different n-gram models.
  • Issues with Larger N-grams: Increased complexity and memory requirements.

Usage

  1. Main Screen:

    • Click "Load Document" to upload a document.
    • Click "Process Document" to analyze the document.
    • Use sorting and statistics options as needed.
    • Click "Run Language Model Analysis" for advanced analysis.
  2. Language Model Analysis Screen:

    • Enter text for prediction and select a language model.
    • Click "Predict" to see predicted words.
    • Click "Show Statistics" for detailed analysis.
    • Click "Main Page" to return to the main screen.

License

See the LICENSE file for details.


Feel free to reach out via the project's GitHub repository for any issues or contributions. Enjoy exploring Language Model Project!

About

Language Model project is a Java-based language and N-Gram model. It predicts up to two words based on a single word input and provides detailed text analysis statistics. Demonstrating advanced object-oriented programming and design principles, it is a valuable tool for predictive text input and linguistic analysis.

Topics

Resources

License

Stars

Watchers

Forks

Languages