Skip to content
This repository has been archived by the owner on Sep 10, 2021. It is now read-only.
/ apple_epf Public archive
forked from artemk/apple_epf

Downloader, Extractor and Parser for Apple Epf Affiliate files

License

Notifications You must be signed in to change notification settings

eSpark/apple_epf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AppleEpf

Installation

  gem 'apple_epf'

Setup

Put this in your initializer.rb if you are using Rails.

AppleEpf.configure do |config|
  config.apple_id = 'username'
  config.apple_password = 'password'
  config.download_retry_count = 3 #
  config.keep_tbz_after_extract = false
  config.extract_dir = '' # where to extract to
  config.files_matrix = {} # {popularity: ['application_popularity_per_genre']}
  #config.files_matrix = {itunes: [], pricing: [], popularity: []}
  config.download_processor = AppleEpf::AriaDownloadProcessor
  config.concurrent_downloads = 16
  config.log_file = "#{Rails.root}/log/apple_epf_#{Rails.env}.log"
  #config.archiver = :gnutar
  #config.archiver_path = "/usr/local/bin/tar"
end

All of this can be redefined for every downloader.

Manual manipulations

  # Manually download one file
  downloader = AppleEpf::Downloader.new('incremental', 'popularity', Date.parse('17-01-2013'))
  downloader.download #=> will return local filepath to downloaded file or fire exception

  # Manually extract one archive
  extractor = AppleEpf::Extractor.new(filename, files_to_extract)
  # filename - full path to local file
  # files_to_extract - Files to be extracted from Archive (application, application_detail)
  file_entry = extractor.perform #=> will return instance of FileEntry
  file_entry.tbz_file #=> original file that was parsed. It is removed after untaring
  file_entry.extracted_files #=> newly created(unpacked) files

  #Manually parse file
  parser = AppleEpf::Parser.new(filename)
  # filename - full local path to file

  parser.process_rows do |r|
    puts "row is #{r}"
  end

Download and Extract

If you want to combine downloading and extracting your can use one of following methonds. My personal feeling is to parsing should we something live alone and should not be combined in one stack with download and extract. And of cource it is better to download and extract files one by one.

  manager = AppleEpf::Incremental.new('10-10-2012', 
    { popularity: ['application_popularity_per_genre'] })


  manager = AppleEpf::Full.new('10-10-2012', 
    { popularity: ['application_popularity_per_genre'] })


  manager.download_all_files 
  # will download all files for this date 
  # for all keys "popularity", 'pricing', 'itunes' etc

  manager.download_and_extract_all_files 
  #will first download and than extract all files

  manager.download_and_extract('itunes', ['application', 'application_detail']) 
  # will download only 'itunes' and extract only ['application', 'application_detail']. 
  # This actually ignores matrix passed to initializer

  manager.download('itunes') #will only download file

You can omit where to store files by setting it directly to downloader instance

  manager.store_dir = '/whatever_dir_you_like'
  manager.download('itunes')

OR

  downloader = AppleEpf::Downloader.new('incremental', 'popularity', Date.parse('17-01-2013'))
  downloader.dirpath = '/whatever_dir_you_like'
  downloader.download

You can also omit if you want to store initial tbz files after they will be unpacked

extractor.keep_tbz_after_extract = true

OR

manager.keep_tbz_after_extract = true

Get list of current files avaliable for download

AppleEpf::Incremental.get_current_list #=> current incremental files
AppleEpf::Full.get_current_list #=> current full files

Make sure to try AriaDownloadProcessor

There are 2 downloaders avaliable for use:

  1. CurbDownloadProcessor - default one
  2. AriaDownloadProcessor - we use in production

I suggest using last one, as in can do download in parallel. I sugest set config.concurrent_downloads = 16 or 8. If you chose to use aria, make sure you have aria2c in your PATH.

And of cource you write your own processor.

Make sure to try lbzip2

lbzip2 uncompress files in parallel. Thanks @funkyboy for his help. Just set config.use_lbzip2 = true and make sure you have it in PATH. On mac you can use brew install lbzip2 for it.

About

Downloader, Extractor and Parser for Apple Epf Affiliate files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 75.5%
  • HTML 24.5%