Need Help Throttling Downloads From an FTP Site

Mike Bonner bonnmike at gmail.com
Mon Sep 21 19:44:24 EDT 2015


The problem doesn't seem to be a local network issue.  When I try to grab
files from the sec site, too many connections too fast make it choke.
 (There end, not mine, most likely anti-bot code)

As scott rossi said, using a delay should help.  I've noticed, the magic
number seems to be 5, so I used load and a counter to get reliable
downloads.

local sList,sBaseUrl,sCount
on mouseUp
   put 0 into sCount
   put "ftp://anonymous:nobody@ftp.sec.gov/edgar/forms/" into sBaseUrl --
the folder I chose to download from.
   put empty into field 2 -- my status field
   set the defaultfolder to specialfolderpath("desktop") & "/downloads" --
where I'm saving em
   put field 1 into sList -- my list of files
   downloadit -- start the downloads
end mouseUp

command downloadit
   repeat for each line tLine in sList
      if sCount mod 5 is 0 then wait 5 seconds with messages -- pause every
5 files

      load URL (sBaseUrl & tLine) with message "doDownloads" -- load the
url into cache then process with doDownloads
      add 1 to sCount
   end repeat
end downloadit
command doDownloads pUrl, pStatus
      put URL pUrl into URL ("binfile:" & line 1 of sList) -- save the file
from cache
      put  pUrl & ":" && pStatus & cr after field 2 -- update the status
field
      unload pUrl -- clear the url from the cache
end doDownloads

On Mon, Sep 21, 2015 at 4:33 PM, Scott Rossi <scott at tactilemedia.com> wrote:

> How large are the files you're retrieving?  If the script below is your
> actual script, you might try allowing some execution time in the loop:
>
> repeat with each line remoteFilePath in listOfFilePaths
>     -- set new localFileName is set before the download request is made
>     put url ("ftp://anonymous:myEmailAddress@ftp.sec.gov/" &
> remoteFilePath)
> into url ("file:/" & exportFolderPath & "/" & localFileName )
>     wait 2 seconds with messages --  <-- ADD THIS
> end repeat
>
> It would probably be most helpful to you to check the status of each
> request, so you can keep track of which events succeeded and which failed.
> I
> imagine there are folks on the list who have something like this more
> readily available than me.
>
> Regards,
>
> Scott Rossi
> Creative Director
> Tactile Media, UX/UI Design
>
>
>
> On 9/21/15, 2:33 PM, "use-livecode on behalf of Gregory Lypny"
> <use-livecode-bounces at lists.runrev.com on behalf of
> gregory.lypny at videotron.ca> wrote:
>
> > Hello everyone,
> >
> > I posted about this a while back but am still having trouble.
> >
> > I need to download thousands of files from the Security and Exchange
> > Commission's website. Access is through anonymous FTP with "anonymous"
> as the
> > username and my email address as the password. I've been using Put in a
> Repeat
> > With loop as
> >
> > repeat with each line remoteFilePath in listOfFilePaths
> > ‹ set new localFileName is set before the download request is made
> >     put url ("ftp://anonymous:myEmailAddress@ftp.sec.gov/" &
> remoteFilePath)
> > into url ("file:/" & exportFolderPath & "/" & localFileName )
> > end repeat
> >
> > but my script dies (the stack is lifeless and unresponsive) after a few
> dozen,
> > and sometimes a few hundred downloads. I used similar scripts in
> Mathematica
> > and confirmed that the problem is session-timed-out and
> > cannot-connect-to-server types of errors. The SEC's webmaster tells me,
> "There
> > is no load/rate limiting on FTP, but if you are running a fast process,
> it is
> > possible you are temporarily overwhelming the server." So, I'm thinking
> that I
> > need to throttle my requests, and maybe should be using
> libURLDownloadToFile
> > to check the status of the current file being downloaded and not request
> > another file until the current download is complete. I also wonder
> whether I
> > should be connecting to the FTP site only once with the username and
> password,
> > loop my requests, and then close the connection. Not sure how to do
> either of
> > these and would greatly appreciate any suggestions or tips.
> >
> > Gregory
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription
> > preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list