org.apache.lens.ml.spark
Class ColumnFeatureFunction

java.lang.Object
  extended by org.apache.lens.ml.spark.FeatureFunction
      extended by org.apache.lens.ml.spark.ColumnFeatureFunction
All Implemented Interfaces:
Serializable, org.apache.spark.api.java.function.Function<scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord>,org.apache.spark.mllib.regression.LabeledPoint>

public class ColumnFeatureFunction
extends FeatureFunction

A feature function that directly maps an HCatRecord to a feature vector. Each column becomes a feature in the vector, with the value of the feature obtained using the value mapper for that column

See Also:
Serialized Form

Field Summary
static org.apache.log4j.Logger LOG
          The Constant LOG.
 
Constructor Summary
ColumnFeatureFunction(int[] featurePositions, FeatureValueMapper[] valueMappers, int labelColumnPos, int numFeatures, double defaultLabel)
          Feature positions and value mappers are parallel arrays.
 
Method Summary
 org.apache.spark.mllib.regression.LabeledPoint call(scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord> tuple)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.log4j.Logger LOG
The Constant LOG.

Constructor Detail

ColumnFeatureFunction

public ColumnFeatureFunction(int[] featurePositions,
                             FeatureValueMapper[] valueMappers,
                             int labelColumnPos,
                             int numFeatures,
                             double defaultLabel)
Feature positions and value mappers are parallel arrays. featurePositions[i] gives the position of ith feature in the HCatRecord, and valueMappers[i] gives the value mapper used to map that feature to a Double value

Parameters:
featurePositions - position number of feature column in the HCatRecord
valueMappers - mapper for each column position
labelColumnPos - position of the label column
numFeatures - number of features in the feature vector
defaultLabel - default lable to be used for null records
Method Detail

call

public org.apache.spark.mllib.regression.LabeledPoint call(scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord> tuple)
                                                    throws Exception
Specified by:
call in interface org.apache.spark.api.java.function.Function<scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord>,org.apache.spark.mllib.regression.LabeledPoint>
Specified by:
call in class FeatureFunction
Throws:
Exception


Copyright © 2014–2015 Apache Software Foundation. All rights reserved.