How to extract specific columns/line items from a CSV file?

Peter M. Brigham pmbrig at gmail.com
Tue Sep 27 08:36:53 EDT 2016


I’m late to this party (these days too busy to peruse the list more than episodically).

Don’t know if this would be faster or slower or equivalent, but it’s a modular function:

function getColumns pData, pColNbrs, pDelim
   -- Extract specified columns from a table in order
   --    pData: a tab-delimited table (delim may be overridden with pDelim)
   --    pColNbrs: A comma separated list of columns or column ranges
   --       to be returned in the order indicated, for example "2,7,5"
   --       or an ascending column range: "3-5"
   --       or a descending column range: "5-3"
   --       or a combination "7,4-5,2,11-9"
   -- based on deleteColumns() by Hugh Senior & Michael Doub et. al., use-LC list
   --     adjusted by Peter M. Brigham, pmbrig at gmail.com
   -- requires howMany()
   
   if pColNbrs = empty then return pData
   if pDelim = empty then put tab into pDelim
   -- first expand column ranges
   repeat for each item pCol in pColNbrs
      put offset("-",pCol) into dashPos
      if dashPos = 0 then
         put pCol & comma after tColsToReturn
      else if dashPos = 1 then
         -- for column numbers like -1, -2, etc.
         put howMany(pDelim,line 1 of pData) + 1 into nbrItems
         put pCol + 1 + nbrItems & comma after tColsToReturn
      else
         -- a column range
         put char 1 to dashPos-1 of pCol into firstColNbr
         put char dashPos+1 to -1 of pCol into lastColNbr
         if firstColNbr < lastColNbr then
            repeat with i = firstColNbr to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         else
            repeat with i = firstColNbr down to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         end if  
      end if       
   end repeat
   delete char -1 of tColsToReturn
   -- then extract columns in order
   set the columnDelimiter to pDelim
   split pData by column
   repeat for each item n in tColsToReturn
      add 1 to x
      put pData[n] into rData[x]
   end repeat
   combine rData by column
   return rData
end getColumns

function howmany pStr, pContainer, pCaseSens
   -- how many times pStr occurs in pContainer
   -- note that howmany("xx","xxxxxx") returns 3, not 5
   --     ie,  overlapping matches are not counted
   -- by Peter M. Brigham, pmbrig at gmail.com — freeware
   
   if pCaseSens = empty then put false into pCaseSens
   set the casesensitive to pCaseSens
   if pStr is not in pContainer then return 0
   put len(pContainer) into origLength
   replace pStr with char 2 to -1 of pStr in pContainer
   return origLength - len(pContainer)
end howmany


> On Sep 20, 2016, at 5:14 PM, Keith Clarke <keith.clarke at me.com> wrote:
> 
> Thanks Mike (& Craig) for confirming that iteration of some sort is the way to go. 
> 
> It’s interesting that the url container can be addressed directly, instead of loading the file - I’d not seen that before. I’m currently pulling each file into a regular variable, so will probably stick with that as only a few (existing) lines of code would be saved and the net overall effect on timing / resources is probably marginal if the file gets loaded behind the scenes anyway.
> 
> As the column names/numbers vary by CSV file name - which I have built-out in a ColumnsOfInterest field - it looks like I’ll also be safe to iterate through these items to build the row extraction line dynamically, too.
> 
> Great, thanks again - I have a direction of travel.
> Best,
> Keith..
> 
>> On 20 Sep 2016, at 21:32, Mike Bonner <bonnmike at gmail.com> wrote:
>> 
>> If a file you can do this..
>> 
>> repeat for each line tline in url "file:yourfilename.csv"
>> put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
>> return after tExtracted
>> end repeat
>> delete the last char of tExtracted -- remove the extraneous return
>> 
>> Or you can put them into an array or whatever. It should be VERY fast.  If
>> the files are very large, you can instead open the file for read, read a
>> line, extract the data as above, and repeat till the end of the file. I
>> think using repeat for each with a file url loads the whole thing at once
>> then goes line to line, but I'm not sure of that.
>> 
>> On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <keith.clarke at me.com> wrote:
>> 
>>> Hi folks,
>>> I’ve a large number of sizeable CSV files from each of which I need to
>>> extract just 2 or 3 specific columns. Creating nested iterations through
>>> every item in every line seems somewhat inefficient - as does loading each
>>> full file into memory - so I feel I must be missing a trick here.
>>> 
>>> Does Livecode support any elegant way of directly manipulating or
>>> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
>>> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
>>> from a CSV container or direct from a file/URL?
>>> Best,
>>> Keith..
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list