My data are currently stored as 16 individual .txt files, however it would be nice to know how to convert comma seperated data (.csv format),or any other simple text files as well. We can easily combine these using a simple, one line DOS/command prompt command.
16 data files are a nightmare. |
Merge documents from a DOS/command prompt:
- Click Start :: Run, type cmd and hit enter to bring up a command prompt
- Use CD (Change Directory) commands to navigate to the desired working directory of your individual datasets
- Use the copy command to create a new file that contains a copy of all of the datasets together. See below for syntax and examples
Syntax:
copy *.[format] [output_filename].[format]
In English this instruction says, "copy the contents of all files in this directory of a certain file format to a new file named whatever."
Example:
copy *.txt combined.txt
Command prompt view:
View of running the copy command |
Output folder view:
A single, appended file is wonderful, but it'll need some minimal formatting |
Remove headers stored as records:
The data are likely to have originally been exported with field names/headings in the first row.
Three individual text files with a header in the first row that provides field names for each dataset |
These will be included in the output file because the copy function appends every line of the originating datasets into a the new file.
The copy command appended the field headings into the dataset. These will need to be removed. |
Another way is to import the dataset into a database such as MS Access and run a SQL query to remove the erroneous records:
SQL syntax:
DELETE FROM [table name]
WHERE [field 1 name] = "[
field
1 name]" AND
[
field
2 name] = "[
field
2 name]";
This will begin by attempting to delete rows from the table that was imported. The query is limited to records where the data in a field matches the heading. An AND statement is used to make sure it was not a coincidence that the data matched the field heading. This can be repeated for each field in the dataset if necessary. Here's an example from my data.
Example statement:
DELETE FROM GWISUpdates
WHERE station_id = "station_id" AND
resource_type = "resource_type";