Skip to content

Missing value imputation technique to predict missing time-point from Microarray Time-Series data.

Notifications You must be signed in to change notification settings

abhip2565/MVT-Impute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

MVT-Impute Jupyter Notebook

This repository contains a Jupyter Notebook that demonstrates techniques for handling missing values in data specifically for time series data.

Table of Contents

Introduction

Welcome to the Data Analysis Notebook dedicated to enhancing your data handling and prediction capabilities. This notebook is your guide to mastering advanced techniques for addressing missing values and leveraging the power of regression to predict future trends.

Bridging the Gap: Interpolation for Missing Values

Handling missing values is a critical aspect of data analysis. In this notebook, we'll explore an efficient technique known as interpolation. Interpolation allows us to estimate missing values within a dataset, providing a complete and accurate representation for analysis.

Unleashing Predictive Insights: Regression for Future Trends

Predicting future trends is a powerful tool for decision-making. Regression analysis enables us to model relationships between variables and make informed predictions. We'll delve into regression techniques that offer valuable insights into the trajectory of your data.

What to Expect

  1. Understanding Missing Data: Gain insights into the types and impact of missing data in your datasets.

  2. Interpolation Techniques: Learn various interpolation methods.

  3. Regression Models: Explore regression models such as linear regression and auto regression for future forecasting.

  4. Practical Application: Apply these techniques to real-world datasets, ensuring you're well-equipped for data analysis challenges.

By the end of this notebook, you'll have the tools to prevent missing values and harness the predictive potential of regression. Let's dive in and empower your data analysis journey!

Getting Started

Follow these steps to get started with the notebook:

  1. Clone this repository to your local machine.
  2. Open the notebook in your preferred environment (Jupyter, Google Colab, etc.).
  3. Run the cells sequentially to learn about interpolation and regression techniques.

Feel free to explore, experiment, and apply these techniques to your own datasets.

Data Source

The model has been tested againt various time-series data and it has shown promising results. The name of the datasets have been mentioned in the notebook and in my conference paper

Data Preprocessing

In this section, we'll cover the essential data preprocessing steps required before applying interpolation techniques. We'll walk through the process of cleaning the dataset, introducing random missing values, converting the data to an array format, and preparing it for interpolation.

Cleaning the Dataset

Before we delve into interpolation, it's crucial to ensure that our dataset is properly cleaned. Data cleaning involves tasks such as handling outliers, addressing duplicate records, and dealing with any data quality issues that might impact our analysis.

Generating Random Missing Values

To simulate real-world scenarios, we'll intentionally introduce random missing values into our dataset. These missing values will serve as the target for the subsequent interpolation techniques.

Converting to Array Format

Many interpolation methods work efficiently with numerical arrays. We'll transform our dataset into an array format that is suitable for applying interpolation techniques.

Imputation Techniques

Interpolation allows us to estimate missing values based on the surrounding data points. We'll explore various interpolation methods, including:

  • Newton's Forward Interpolation: Forward differences and a polynomial estimate values in evenly spaced data. Useful for data completion with equal intervals,it constructs a polynomial passing through points and predicts values within the range.
  • Newton's Backward Interpolation: Backward differences and a polynomial predict values in equally spaced data. Like forward interpolation, it constructs a polynomial that passes through points, aiding data completion and estimation within the range.
  • Lagrange's Interpolation: Lagrange Interpolation is a polynomial interpolation method that estimates values within a range using a single polynomial that passes through given data points. It simplifies the interpolation process by constructing individual polynomials for each data point and combining them to predict values accurately.

Algorithm Demo

Demo!!

Results and Findings

Results have been taken with ample variations which includes:

  • Entire time point(s) missing.
  • percentage of data from time point(s) missing.
  • Scattered portion of data missing.
  • Future time point prediction.

The results, u can find in the paper, whuch i recommend do give a read.

Usage

The secret of this algo lies in its simplicity. Often we dwell into complex algos to predict missing values involving time-series. This algo is a good fit to save somme of your time!! Get your hands dirty with this method in real world datasets, do let me know how well the algo is performing!!

Regards! Thank You!!


For more details, refer to the Jupyter Notebook.

About

Missing value imputation technique to predict missing time-point from Microarray Time-Series data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published