Abstract:
Data missing is a common phenomenon in machine learning, and the reasons are usually human errors, data processing software bugs, incorrect sensor readings, and so on. The performance downgrade of machine learning can be caused by data missing, and thus missing data imputation is of great importance for machine learning tasks. Aiming at this problem, a novel missing data imputation method is proposed, in which a generative adversarial imputation network (GAIN) is constructed. GAIN is mainly composed of two components, including generator and discriminator. The generator (G) is used to observe each part of the real data, then completes the missing part of the data according to the observation results, and finally outputs an imputed vector.The discriminator (D) accepts a complete vector to determine which part of the data is truly observed and which is imputed. Experimental results on four public UCI datasets and real drilling fluid datasets verify the GAIN is effective. It can improve the performance of machine learning tasks.