Nerd tips for things you probably won't use: August 2012

Wednesday, August 29, 2012

Combine Matching Datasets

I have multiple text documents that contain pipe delimited data (|). It would be much easier to work with if the information could be appended or otherwise combined into a single data file.

My data are currently stored as 16 individual .txt files, however it would be nice to know how to convert comma seperated data (.csv format),or any other simple text files as well. We can easily combine these using a simple, one line DOS/command prompt command.

16 data files are a nightmare.

Merge documents from a DOS/command prompt:

Click Start :: Run, type cmd and hit enter to bring up a command prompt
Use CD (Change Directory) commands to navigate to the desired working directory of your individual datasets
Use the copy command to create a new file that contains a copy of all of the datasets together. See below for syntax and examples

Syntax:

copy *.[format] [output_filename].[format]

In English this instruction says, "copy the contents of all files in this directory of a certain file format to a new file named whatever."

Example:

copy *.txt combined.txt

Command prompt view:

View of running the copy command

Output folder view:

A single, appended file is wonderful, but it'll need some minimal formatting

Remove headers stored as records:
The data are likely to have originally been exported with field names/headings in the first row.

Three individual text files with a header in the first
row that provides field names for each dataset

These will be included in the output file because the copy function appends every line of the originating datasets into a the new file.

The copy command appended the field headings
into the dataset. These will need to be removed.

Removing these can be as simple as importing the file into spreadsheet processing software, such as Excel, and manually looking through the dataset for these lines and deleting any of those rows. Of course we will leave the very first row of these alone since we will likely want one dataset header at the beginning of the file. The remaining instance will otherwise be stored as actual records and that is incorrect.

Another way is to import the dataset into a database such as MS Access and run a SQL query to remove the erroneous records:

SQL syntax:

DELETE FROM [table name]

WHERE [field 1 name] = "[

field

1 name]" AND

[field 2 name] = "[field 2 name]";

This will begin by attempting to delete rows from the table that was imported. The query is limited to records where the data in a field matches the heading. An AND statement is used to make sure it was not a coincidence that the data matched the field heading. This can be repeated for each field in the dataset if necessary. Here's an example from my data.

Example statement:

DELETE FROM GWISUpdates

 WHERE station_id = "station_id" AND

resource_type = "resource_type";

Upcoming Articles

Create a Drop-Down List in Excel
Important Fundamental SQL Commands
Targeting Duplicates in a Spatial Database
Geotagging Field Recon Photos & Displaying in GIS
Combining Nonspatial Tables with Spatial Datasets
Basic Database Coding (VBA/SQL) for Access
Projecting Raw Data
Emergency Response GIS - Working with Florida's Deepwater Horizon Oil Spill Response Team
Advanced label building with VBScripts
Integrating & publishing GPS data with Google Maps
Working with topology checks
ArcMap Attribution Toolset (Custom program/plugin)
ArcNotepad (Custom program/plugin)
Using the Tab key in VB forms
Geodatabase schema locks
Packaging & publishing custom GIS tools & scripts
Dynamic Landscape Change Modeling & Prediction Techniques (Essay)
Ratios of Florida's Major Stream/Canal miles from the 100k National Hydrography Dataset
Introduction to Basic Electronics
DIY guitar effects
Introduction to Python/Java/object-oriented programming
Modeling Pedestrian Traffic & Recycling Bin Effectiveness at Florida State University
Tornado Warning Utility - Integrating Internet- and desktop-based applications
Statewide streams coverage development methodology
Creating GIS data with Google
Writing and retrieving information from a text file in VB
GRASS: Open source GIS
Linux: Ubuntu / openSUSE
VB Weather App (custom program)
DIY recording cart/studio mixer housing
HTML Tutorial: Image Maps
Introduction to MIDI/Synth
I hate your legend - Cartographic tips
vlookup: Spreadsheet data maniuplation

Nerd tips for things you probably won't use

Wednesday, August 29, 2012

Combine Matching Datasets

About Me

Upcoming Articles

Search by Tags

Blog Archive

Interesting Blogs