It's our pleasure to highligh the initiative taken by our data team leader Ahmed Mahran to effectively contribute to the Spark Time Series project, created by Sandy Ryza, a senior data scientist at Cloudera, the leading big data solutions provider.

 

Time Series data has gained an increasing attention in the past few years. To quote Sandy Ryza:

 

Time-series analysis is becoming mainstream across multiple data-rich industries. The new Spark-TS library helps analysts and data scientists focus on business questions, not on building their own algorithms.

 

Find the full story here, where he introduces SparkTS, and accredits our contributor.

 

We are, forever, indebted to the open source community, it enabled us to create wonderful feats. It's our deep belief that we should give back to the community in order to guarantee its health and sustainability. We are proud that we effectively contributed to such great project and we are looking forward to more.

Retrofit 2.0.0-beta1

September 6th 2015, 10:01 amCategory: Mobile 0 comments

Retrofit is one of the famous REST Client java libraries which we can use in android application and java desktop applications. 
We can combine Retrofit with Gson library to deal with any REST API.

Here we will show how can we use Retrofit 2.0.0 with GSON library by showing a simple example on Android to get user information and list of his repos from Github API.
 
At first you should setup Retrofit, Gson and GsonConverter by adding the following libraries to dependecies in gradle file:

dependencies {
 compile 'com.squareup.retrofit:retrofit:2.0.0-beta1'
 compile 'com.squareup.retrofit:converter-gson:2.0.0-beta1'
 compile 'com.google.code.gson:gson:2.2.4'
}
Then we can build the POJO(Plain Old Java Object) which we will use to cast the response from the request by using http://www.jsonschema2pojo.org/. That is a very useful online tool, it convert the josn object to Java classes which use Gson library like the follwing images: 


Create a new class GitHubUser in your code, we will use on the next lines.  

Create a GithubService Interface: 
public interface GitHubService { @GET("/users/{user}/repos") Call<List<Repo>> listRepos(@Path("user") String user); }

Now we can use Retrofit by injecting the following code in any android service which extends IntentService the following code: 
    Retrofit retrofit = new Retrofit.Builder().
                    baseUrl("https://api.github.com").
                    addConverterFactory(GsonConverterFactory.create()).
                    build();

    GitHubService gitHubService = retrofit.create(GitHubService.class);

    Call<GitHubUser> gitHubUserCall = gitHubService.getUserInfo("octocat");
    try {
        Response<GitHubUser> response = gitHubUserCall.execute();
        if (response.isSuccess()) {
            GitHubUser user = response.body();
            Log.d("User Email:", user.getEmail());
            Log.d("User Avatar URL:", user.getAvatarUrl());
        }
    } catch (IOException e) {
        e.printStackTrace();
    }






 

   
    In search space, pagination always has to happen. Solr has the feature of basic paging. In basic paging, you simply specify start and rows parameters. start indicates where the returned results should start and rows specifies how many documents are returned. With basic paging, partial index exporting and migration is a problem. Since basic paging needs to sort all the results first before returning the desired subset, it needs large amount of memory if start is of high order. For instance, start=1000000 and rows=10 causes an inefficient memory allocation to happen due to the sorting of 1000010 documents. In a distributed environment, the case is worse because the engine has to fetch 1 million documents from each shard, sort them then return the result set.

PhoneGap is an open source framework for building cross-platform mobile apps using basic web skills like HTML5, Javascript and CSS. It is just all about wraping your app with PhoneGap to fast deploy in diffrent plateforms. I started developing applications since April'12, at the begining I was really impressed since I really used all my web skills to develop apps but after only 6 months I began to be shoked by performance and ui issues like smooth animations and more other issues.

When inserting large amount of data to DB in Rails, large number of insert statements take a lot of time. We can reduce this time dramatically using mysqlimport, which is mysql tool to insert bulk data into the DB quickly, quoting mysql words about mysqlimport: "reads rows from a text file into a table at a very high speed". To use mysqlimport in updating records also, not only in inserting new records, we used a common trick for this problem: Creating a temporary table, inserting updated records(with ids) in this table, join the temporary table with target table on id column and update target table columns accordingly, then drop the temporary table. You can find this idea illustrated for exampel here: http://dba.stackexchange.com/questions/11811/mysql-csv-update-not-insert-into-existing-table


So here is a helper class to use it within Rails application.
 
class SqlWriter

  ID_STR = 'id'
  CREATED_AT_STR = 'created_at'
  UPDATED_AT_STR = 'updated_at'
  NULL_STR = '\N'
  COMMA_STR = ','
  attr_accessor :insert_sql_file, :update_sql_file
  # klass is the class of the records we will deal with
  # sql_dir_path is the directory which will contain the sql data file(text file).
  def initialize(klass, sql_dir_path)
    @klass = klass
    @temp_table_name = "temp_#{klass.table_name}_#{Time.now.to_s(:db).gsub(/-| |:/,'_')}_#{SecureRandom.hex[0..10]}"
    @insert_sql_file = File.new("#{sql_dir_path}/#{klass.table_name}.txt", 'w')
    @update_sql_file = File.new("#{sql_dir_path}/#{@temp_table_name}.txt", 'w')
    @current_time_in_db_format = Time.now.to_s(:db)
    @insert_fields = klass.new.attributes.except(ID_STR).keys
    @update_fields = klass.new.attributes.keys
    @records_need_update = false
  end

  def write_record_to_sql_file(record)
    row_data = get_sql_row(record)
    if record.new_record?
      @insert_sql_file.write("#{row_data}\n")
    else
      @update_sql_file.write("#{row_data}\n")
    end
  end

  def insert_records_to_database
    @insert_sql_file.close
    @update_sql_file.close
    config   = Rails.configuration.database_configuration
    database = config[Rails.env]["database"]
    username = config[Rails.env]["username"]
    password = config[Rails.env]["password"]
    host = config[Rails.env]["host"]
    insert_columns_orders = @insert_fields.join(',')
    `mysqlimport -u #{username} -p#{password} -h #{host} --columns='#{insert_columns_orders}' --local --fields-terminated-by=',' #{database} #{Shellwords.escape(@insert_sql_file.path)}`
    if @records_need_update
      ActiveRecord::Base.connection.execute("CREATE TABLE #{@temp_table_name} LIKE #{@klass.table_name};")
      update_columns_orders = @update_fields.join(',')
      `mysqlimport -u #{username} -p#{password} -h #{host} --columns='#{update_columns_orders}' --local --fields-terminated-by=',' #{database} #{Shellwords.escape(@update_sql_file.path)}`
      set_fields = @insert_fields.map{|field| "#{@klass.table_name}.#{field}=#{@temp_table_name}.#{field}"}.join(',')
      ActiveRecord::Base.connection.execute("UPDATE #{@klass.table_name} INNER JOIN #{@temp_table_name} ON #{@klass.table_name}.id = #{@temp_table_name}.id SET #{set_fields}")
      ActiveRecord::Base.connection.execute("DROP TABLE #{@temp_table_name}")
    end
    File.delete(@update_sql_file)
  end

  private
    def get_sql_row(record)
      if record.new_record?
        result = record.attributes.except(ID_STR).values
        fields = @insert_fields
      else
        result =  record.attributes.values
        fields = @update_fields
        @records_need_update = true
      end
      result.each_with_index do |item, index|
        if item.class == Date || item.class == Time
          result[index] = item.to_s(:db)
        elsif item == true || item == false
          result[index] = item ? 1 : 0
        elsif item == nil
          if fields[index] == CREATED_AT_STR || fields[index] == UPDATED_AT_STR
            result[index] = @current_time_in_db_format
          else
            result[index] = NULL_STR
          end
        end
      end
      result.join(COMMA_STR)
    end
end

For example, assume that we are inserting large number of records of User model:
 
sql_file_dir = "path/to/some/dir"
sql_writer = BulkDataWriter.new(User, sql_file_dir)
alot_of_data.each do |data|
  #......
  user = User.new(user_attributes)
  sql_writer.write_record_to_sql_file(user)
end
sql_writer.insert_records_to_database 
And you will have you data inserted to DB!