CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

Keith Clarke keith.clarke at me.com
Tue Apr 5 13:28:23 EDT 2022


Ah, thanks Alex - I’ll dig into that. 
I did search around for CSV to TSV in several places before posting but not CSV to Tab and not github!
Best,
Keith

> On 5 Apr 2022, at 18:17, Alex Tweedly via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> Hi Keith,
> 
> that code will fail for any commas which occur within quoted entries - they will be wrongly converted to TABs
> 
> I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a whole host of others over the years) as a good starting place, and perhaps finishing place. It will handle most CSV oddities (not all of them - that is provably impossible :-).
> 
> This does an efficient walk through the data, remembering whether it is inside or outside quoted entries, and hence handles commas accordingly.
> 
> https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript
> 
> Alex.
> 
> On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:
>> Hi folks,
>> Thanks all for the responses and ideas on consolidating multiple CSV files into - much appreciated.
>> 
>> Ben - Thank you for sharing your working recipe. This lifted my spirits as it showed I was on the right path (very nearly!) and you moved me on a big step from where I was stuck.
>> 
>> My script was successfully iterating through folders and files, with filtering to get a file list of just CSVs with their paths for onward processing. I’d also identified the need to maintain registers of (growing) column names, together with  a master row template and a mapping of the current file’s column headers in row-1 to the master to put align output columns. I got stuck when I set up nested repeat loops for files, then lines, then items and was trying to deal with row 1 column headers and data rows at the same time, which got rather confusing. Separating the column name processing from parsing row data made life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in Excel.
>> 
>> However… I’m getting cells dropped into the wrong columns in the output file. So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can anyone spot any obvious errors or omissions in the following...
>> -- convert from CSV to TSV
>> 
>> replace tab with space in tFileData -- clear any tabs in the content before setting as a delimiter
>> 
>> replace quote & comma & quote with tab in tFileData -- change delimiter for quoted values
>> 
>> replace comma with tab in tFileData -- change delimiter for unquoted values
>> 
>> replace quote with "" in tFileData -- clear quotes in first & last items
>> 
>> set the itemDelimiter to tab
>> 
>> Best,
>> Keith
>>    _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list