Data recognition, parsing, filtering, and transformation — GUI?
Looking for a non-cloud based open source app for doing data transformation; though for a killer (and I mean killer) app just built for data transformations, I might be willing to spend up to $1000.
Perl, Python, Ruby which are clearly languages, but unable to find any frameworks/DSLs just for processing data; meaning they're really not a great development environments, meaning there's no built GUI's for building RegEx, Input/Output (CSV, XML, JDBC, REST, etc.), no debugger for testing rows and rows of data -- they're not bad either, just not what I'm looking for, which is a GUI built for complex data transformations; that said, I'd love if the GUI/app file was in a scripting language, and NOT just stored in some not human readable XML/ASCII file.
So, any suggestions? Do realize that I haven't really spec'd out any transformations, but figure if you really use a product for data munging, I'd like to know about it; even excel, I guess.
In general though, currently I'm looking for a product that's able to handle 1000-100,000 rows with 10-100 columns. It'd be super cool if it could profile data sets, which is a feature Kettle sort of does, but not super well. I'd also like built in unit testing, meaning I'm able to build out control sets of data, and run changes made against the control set. Then I'd like to be able to selectively filter out rows and columns as I build out the transformation without altering the build; for example, I run a data set through transformation, filter the results, and the next run those sets are automatically blocked at the first "logical" occurrence; which in turn would mean less data to "look at" and a reduced runtime per each enhanced iteration; what would be crazy nice is if as I'd filtering out the rows/columns the app is tracking those, (and the output was filtered out). and unit tested/highlighted any changes. If I made a change that would effect the application logs and it's ability to track the unit tests based on me "breaking a branch" - it'd give me a warning, let me dump the data stored branch... and/or track the primary keys for difference in next generation of output, or even attempt to match them using fuzzy logic. And yes, I know this is a pipe dream, but hey, figured I'd ask, just in case there's something out there I've just never seen.
Feel free to comment, I'd be happy to answer any questions, or offer additional info.