How to extract specific columns/line items from a CSV file?
Peter M. Brigham
pmbrig at gmail.com
Tue Sep 27 08:36:53 EDT 2016
I’m late to this party (these days too busy to peruse the list more than episodically).
Don’t know if this would be faster or slower or equivalent, but it’s a modular function:
function getColumns pData, pColNbrs, pDelim
-- Extract specified columns from a table in order
-- pData: a tab-delimited table (delim may be overridden with pDelim)
-- pColNbrs: A comma separated list of columns or column ranges
-- to be returned in the order indicated, for example "2,7,5"
-- or an ascending column range: "3-5"
-- or a descending column range: "5-3"
-- or a combination "7,4-5,2,11-9"
-- based on deleteColumns() by Hugh Senior & Michael Doub et. al., use-LC list
-- adjusted by Peter M. Brigham, pmbrig at gmail.com
-- requires howMany()
if pColNbrs = empty then return pData
if pDelim = empty then put tab into pDelim
-- first expand column ranges
repeat for each item pCol in pColNbrs
put offset("-",pCol) into dashPos
if dashPos = 0 then
put pCol & comma after tColsToReturn
else if dashPos = 1 then
-- for column numbers like -1, -2, etc.
put howMany(pDelim,line 1 of pData) + 1 into nbrItems
put pCol + 1 + nbrItems & comma after tColsToReturn
else
-- a column range
put char 1 to dashPos-1 of pCol into firstColNbr
put char dashPos+1 to -1 of pCol into lastColNbr
if firstColNbr < lastColNbr then
repeat with i = firstColNbr to lastColNbr
put i & comma after tColsToReturn
end repeat
else
repeat with i = firstColNbr down to lastColNbr
put i & comma after tColsToReturn
end repeat
end if
end if
end repeat
delete char -1 of tColsToReturn
-- then extract columns in order
set the columnDelimiter to pDelim
split pData by column
repeat for each item n in tColsToReturn
add 1 to x
put pData[n] into rData[x]
end repeat
combine rData by column
return rData
end getColumns
function howmany pStr, pContainer, pCaseSens
-- how many times pStr occurs in pContainer
-- note that howmany("xx","xxxxxx") returns 3, not 5
-- ie, overlapping matches are not counted
-- by Peter M. Brigham, pmbrig at gmail.com — freeware
if pCaseSens = empty then put false into pCaseSens
set the casesensitive to pCaseSens
if pStr is not in pContainer then return 0
put len(pContainer) into origLength
replace pStr with char 2 to -1 of pStr in pContainer
return origLength - len(pContainer)
end howmany
> On Sep 20, 2016, at 5:14 PM, Keith Clarke <keith.clarke at me.com> wrote:
>
> Thanks Mike (& Craig) for confirming that iteration of some sort is the way to go.
>
> It’s interesting that the url container can be addressed directly, instead of loading the file - I’d not seen that before. I’m currently pulling each file into a regular variable, so will probably stick with that as only a few (existing) lines of code would be saved and the net overall effect on timing / resources is probably marginal if the file gets loaded behind the scenes anyway.
>
> As the column names/numbers vary by CSV file name - which I have built-out in a ColumnsOfInterest field - it looks like I’ll also be safe to iterate through these items to build the row extraction line dynamically, too.
>
> Great, thanks again - I have a direction of travel.
> Best,
> Keith..
>
>> On 20 Sep 2016, at 21:32, Mike Bonner <bonnmike at gmail.com> wrote:
>>
>> If a file you can do this..
>>
>> repeat for each line tline in url "file:yourfilename.csv"
>> put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
>> return after tExtracted
>> end repeat
>> delete the last char of tExtracted -- remove the extraneous return
>>
>> Or you can put them into an array or whatever. It should be VERY fast. If
>> the files are very large, you can instead open the file for read, read a
>> line, extract the data as above, and repeat till the end of the file. I
>> think using repeat for each with a file url loads the whole thing at once
>> then goes line to line, but I'm not sure of that.
>>
>> On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <keith.clarke at me.com> wrote:
>>
>>> Hi folks,
>>> I’ve a large number of sizeable CSV files from each of which I need to
>>> extract just 2 or 3 specific columns. Creating nested iterations through
>>> every item in every line seems somewhat inefficient - as does loading each
>>> full file into memory - so I feel I must be missing a trick here.
>>>
>>> Does Livecode support any elegant way of directly manipulating or
>>> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
>>> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
>>> from a CSV container or direct from a file/URL?
>>> Best,
>>> Keith..
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list