Thursday, September 27, 2012

Data Formatting Using a VBA Macro: Example with Data from PCPartPicker.com

I want to illustrate the cost-to-power and cost-to-memory for a variety of video cards.  PCPartPicker.com is an amazing tool that helps piece together a computer by comparing hardware components while directing users to the best price for each part.

I did a basic search of just video cards sorted by price.  I then manually selected and copied the text straight from the browser and pasted it into a spreadsheet.  Optionally, to remove the formatting go to the Edit menu and choose the Clear sub-menu and click on Formats; or paste the selection into a blank Notepad document, then re-copy and paste into the spreadsheet.
 
The structure of the first row (of field headings) is fine, however each record of data is spread across three rows.  Notice that the rating for the first Sapphire video card is blank (E2) and its value for rating and price are on additional lines (A3 and B4, respectively).

Formatting errors occur when directly copying raw data to a spreadsheet

I wrote a simple macro to perform the hundreds of formats automatically.  Here's how it works:
  • Cut the Rating value from A3 and paste it in cell E2
  • Delete row 3 since it's blank
  • Repeat the procedure for the Price value, then remove that row as well (remember: row 4 becomes row 3 when we deleted row 3 because it was blank)
  • Formatting of the this record is complete; move on to the next row of data

The following VBA code is a macro that will iterate through the spreadsheet and reformat the data appropriately onto a single line for each record.  Copy and paste this into Excel's Visual Basic Editor and tweak to fit your data.  All you need to do is to change the row number of the first out of place value, and change how many times the model will iterate (highlighted below).

For instance, my first error occurs on row 3.  Additionally, I'll let the model run 783 times.  783 comes from the original unformatted dataset (2,350 rows) divided by 3 (the number of lines an individual record spans - which we are reducing to 1) which = 783.33.  You can round up to 800 and the model will just run on blank rows.  If your figure was not enough, you can simply run the model again, starting at the position of the next error (say 781), and set it to run just 4 more times to format 783 total records (+1 for the field headings in row 1 and you will end up with 784 rows total for 783 records).

Macro Code:

Sub FormatVideoCards()
' Created 9/27/2012 by Justin Berke
' http://justinberke.blogspot.com


Dim i As Integer 'Model position

Dim j As Integer 'Loop counter

'Enter row number where the first error occurs

  i = 3

'Run model

 For j = 1 To 783

  'Cut rating
    Application.CutCopyMode = False

    Range("A" & i).Select
    Selection.Cut
  'Paste rating
    Range("E" & (i - 1)).Select
    ActiveSheet.Paste
  'Delete row
    Rows(i & ":" & i).Select
    Selection.Delete Shift:=xlUp

  'Cut price
    Range("B" & i).Select
    Selection.Cut
  'Paste price
    Range("F" & (i - 1)).Select
    ActiveSheet.Paste
  'Delete row
    Rows(i & ":" & i).Select
    Selection.Delete Shift:=xlUp

  'Move model position forward
    i = i + 1
 
Next j

End Sub



As one last step, multiply the rating by -1.  For some reason, those are imported as negative figures.  The result is a well formatted record set.

Properly formatted dataset