You probably noticed in the featured image of this post that there are some funny negative values in the energy data. You might have seen some gaps as well. All these are errors in the data that must be corrected before proceeding with its analysis. The data set has to be free of missing values, duplicates, and outliers, with values at regular time intervals. In this post, I will show how smart grid data exported as a CSV file from a database can be cleaned by addressing the most commonly found errors. I will also discuss some extreme cases where what may look like an error is in fact not.
As usual let me know your thoughts on this in the comments below.