CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
Keith Clarke
keith.clarke at me.com
Tue Apr 5 13:28:23 EDT 2022
Ah, thanks Alex - I’ll dig into that.
I did search around for CSV to TSV in several places before posting but not CSV to Tab and not github!
Best,
Keith
> On 5 Apr 2022, at 18:17, Alex Tweedly via use-livecode <use-livecode at lists.runrev.com> wrote:
>
> Hi Keith,
>
> that code will fail for any commas which occur within quoted entries - they will be wrongly converted to TABs
>
> I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a whole host of others over the years) as a good starting place, and perhaps finishing place. It will handle most CSV oddities (not all of them - that is provably impossible :-).
>
> This does an efficient walk through the data, remembering whether it is inside or outside quoted entries, and hence handles commas accordingly.
>
> https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript
>
> Alex.
>
> On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:
>> Hi folks,
>> Thanks all for the responses and ideas on consolidating multiple CSV files into - much appreciated.
>>
>> Ben - Thank you for sharing your working recipe. This lifted my spirits as it showed I was on the right path (very nearly!) and you moved me on a big step from where I was stuck.
>>
>> My script was successfully iterating through folders and files, with filtering to get a file list of just CSVs with their paths for onward processing. I’d also identified the need to maintain registers of (growing) column names, together with a master row template and a mapping of the current file’s column headers in row-1 to the master to put align output columns. I got stuck when I set up nested repeat loops for files, then lines, then items and was trying to deal with row 1 column headers and data rows at the same time, which got rather confusing. Separating the column name processing from parsing row data made life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in Excel.
>>
>> However… I’m getting cells dropped into the wrong columns in the output file. So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can anyone spot any obvious errors or omissions in the following...
>> -- convert from CSV to TSV
>>
>> replace tab with space in tFileData -- clear any tabs in the content before setting as a delimiter
>>
>> replace quote & comma & quote with tab in tFileData -- change delimiter for quoted values
>>
>> replace comma with tab in tFileData -- change delimiter for unquoted values
>>
>> replace quote with "" in tFileData -- clear quotes in first & last items
>>
>> set the itemDelimiter to tab
>>
>> Best,
>> Keith
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list