I did a basic search of just video cards sorted by price. I then manually selected and copied the text straight from the browser and pasted it into a spreadsheet. Optionally, to remove the formatting go to the Edit menu and choose the Clear sub-menu and click on Formats; or paste the selection into a blank Notepad document, then re-copy and paste into the spreadsheet.
The structure of the first row (of field headings) is fine, however each record of data is spread across three rows. Notice that the rating for the first Sapphire video card is blank (E2) and its value for rating and price are on additional lines (A3 and B4, respectively).
Formatting errors occur when directly copying raw data to a spreadsheet |
I wrote a simple macro to perform the hundreds of formats automatically. Here's how it works:
- Cut the Rating value from A3 and paste it in cell E2
- Delete row 3 since it's blank
- Repeat the procedure for the Price value, then remove that row as well (remember: row 4 becomes row 3 when we deleted row 3 because it was blank)
- Formatting of the this record is complete; move on to the next row of data
The following VBA code is a macro that will iterate through the spreadsheet and reformat the data appropriately onto a single line for each record. Copy and paste this into Excel's Visual Basic Editor and tweak to fit your data. All you need to do is to change the row number of the first out of place value, and change how many times the model will iterate (highlighted below).
For instance, my first error occurs on row 3. Additionally, I'll let the model run 783 times. 783 comes from the original unformatted dataset (2,350 rows) divided by 3 (the number of lines an individual record spans - which we are reducing to 1) which = 783.33. You can round up to 800 and the model will just run on blank rows. If your figure was not enough, you can simply run the model again, starting at the position of the next error (say 781), and set it to run just 4 more times to format 783 total records (+1 for the field headings in row 1 and you will end up with 784 rows total for 783 records).
Macro Code:
Sub FormatVideoCards()
' Created 9/27/2012 by Justin Berke
' http://justinberke.blogspot.com
Dim i As Integer 'Model position
Dim j As Integer 'Loop counter
'Enter row number where the first error occurs
i = 3
'Run model
For
j = 1
To
783
'Cut rating
Application.CutCopyMode = False
Range("A" & i).Select
Selection.Cut
'Paste rating
Range("E" & (i - 1)).Select
ActiveSheet.Paste
'Delete row
Rows(i & ":" & i).Select
Selection.Delete Shift:=xlUp
'Cut price
Range("B" & i).Select
Selection.Cut
'Paste price
Range("F" & (i - 1)).Select
ActiveSheet.Paste
'Delete row
Rows(i & ":" & i).Select
Selection.Delete Shift:=xlUp
'Move model position forward
i = i + 1
Next
j
End Sub
As one last step, multiply the rating by -1. For some reason, those are imported as negative figures. The result is a well formatted record set.
Properly formatted dataset |
2 comments:
Thank you , Chris, I found this exercise highly interesting.
Data formatting
Who's Chris? I'm Justin.
Post a Comment