[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Overview

build

TensorFrames (Deprecated)

Note: TensorFrames is deprecated. You can use pandas UDF instead.

Experimental TensorFlow binding for Scala and Apache Spark.

TensorFrames (TensorFlow on Spark DataFrames) lets you manipulate Apache Spark's DataFrames with TensorFlow programs.

This package is experimental and is provided as a technical preview only. While the interfaces are all implemented and working, there are still some areas of low performance.

Supported platforms:

This package only officially supports linux 64bit platforms as a target. Contributions are welcome for other platforms.

See the file project/Dependencies.scala for adding your own platform.

Officially TensorFrames supports Spark 2.4+ and Scala 2.11.

See the user guide for extensive information about the API.

For questions, see the TensorFrames mailing list.

TensorFrames is available as a Spark package.

Requirements

  • A working version of Apache Spark (2.4 or greater)

  • Java 8+

  • (Optional) python 2.7+/3.6+ if you want to use the python interface.

  • (Optional) the python TensorFlow package if you want to use the python interface. See the official instructions on how to get the latest release of TensorFlow.

  • (Optional) pandas >= 0.19.1 if you want to use the python interface

Additionally, for developement, you need the following dependencies:

  • protoc 3.x

  • nose >= 1.3

How to run in python

Assuming that SPARK_HOME is set, you can use PySpark like any other Spark package.

$SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.6.0-s_2.11

Here is a small program that uses TensorFlow to add 3 to an existing column.

import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row

data = [Row(x=float(x)) for x in range(10)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
    # The TensorFlow placeholder that corresponds to column 'x'.
    # The shape of the placeholder is automatically inferred from the DataFrame.
    x = tfs.block(df, "x")
    # The output that adds 3 to x
    z = tf.add(x, 3, name='z')
    # The resulting dataframe
    df2 = tfs.map_blocks(z, df)

# The transform is lazy as for most DataFrame operations. This will trigger it:
df2.collect()

# Notice that z is an extra column next to x

# [Row(z=3.0, x=0.0),
#  Row(z=4.0, x=1.0),
#  Row(z=5.0, x=2.0),
#  Row(z=6.0, x=3.0),
#  Row(z=7.0, x=4.0),
#  Row(z=8.0, x=5.0),
#  Row(z=9.0, x=6.0),
#  Row(z=10.0, x=7.0),
#  Row(z=11.0, x=8.0),
#  Row(z=12.0, x=9.0)]

The second example shows the block-wise reducing operations: we compute the sum of a field containing vectors of integers, working with blocks of rows for more efficient processing.

# Build a DataFrame of vectors
data = [Row(y=[float(y), float(-y)]) for y in range(10)]
df = sqlContext.createDataFrame(data)
# Because the dataframe contains vectors, we need to analyze it first to find the
# dimensions of the vectors.
df2 = tfs.analyze(df)

# The information gathered by TF can be printed to check the content:
tfs.print_schema(df2)
# root
#  |-- y: array (nullable = false) double[?,2]

# Let's use the analyzed dataframe to compute the sum and the elementwise minimum 
# of all the vectors:
# First, let's make a copy of the 'y' column. This will be very cheap in Spark 2.0+
df3 = df2.select(df2.y, df2.y.alias("z"))
with tf.Graph().as_default() as g:
    # The placeholders. Note the special name that end with '_input':
    y_input = tfs.block(df3, 'y', tf_name="y_input")
    z_input = tfs.block(df3, 'z', tf_name="z_input")
    y = tf.reduce_sum(y_input, [0], name='y')
    z = tf.reduce_min(z_input, [0], name='z')
    # The resulting dataframe
    (data_sum, data_min) = tfs.reduce_blocks([y, z], df3)

# The final results are numpy arrays:
print(data_sum)
# [45., -45.]
print(data_min)
# [0., -9.]

Notes

Note the scoping of the graphs above. This is important because TensorFrames finds which DataFrame column to feed to TensorFrames based on the placeholders of the graph. Also, it is good practice to keep small graphs when sending them to Spark.

For small tensors (scalars and vectors), TensorFrames usually infers the shapes of the tensors without requiring a preliminary analysis. If it cannot do it, an error message will indicate that you need to run the DataFrame through tfs.analyze() first.

Look at the python documentation of the TensorFrames package to see what methods are available.

How to run in Scala

The scala support is a bit more limited than python. In scala, operations can be loaded from an existing graph defined in the ProtocolBuffers format, or using a simple scala DSL. The Scala DSL only features a subset of TensorFlow transforms. It is very easy to extend though, so other transforms will be added without much effort in the future.

You simply use the published package:

$SPARK_HOME/bin/spark-shell --packages databricks:tensorframes:0.6.0-s_2.11

Here is the same program as before:

import org.tensorframes.{dsl => tf}
import org.tensorframes.dsl.Implicits._

val df = spark.createDataFrame(Seq(1.0->1.1, 2.0->2.2)).toDF("a", "b")

// As in Python, scoping is recommended to prevent name collisions.
val df2 = tf.withGraph {
    val a = df.block("a")
    // Unlike python, the scala syntax is more flexible:
    val out = a + 3.0 named "out"
    // The 'mapBlocks' method is added using implicits to dataframes.
    df.mapBlocks(out).select("a", "out")
}

// The transform is all lazy at this point, let's execute it with collect:
df2.collect()
// res0: Array[org.apache.spark.sql.Row] = Array([1.0,4.0], [2.0,5.0])   

How to compile and install for developers

It is recommended you use Conda Environment to guarantee that the build environment can be reproduced. Once you have installed Conda, you can set the environment from the root of project:

conda create -q -n tensorframes-environment python=$PYTHON_VERSION

This will create an environment for your project. We recommend using Python version 3.7 or 2.7.13. After the environemnt is created, you can activate it and install all dependencies as follows:

conda activate tensorframes-environment
pip install --user -r python/requirements.txt

You also need to compile the scala code. The recommended procedure is to use the assembly:

build/sbt tfs_testing/assembly
# Builds the spark package:
build/sbt distribution/spDist

Assuming that SPARK_HOME is set and that you are in the root directory of the project:

$SPARK_HOME/bin/spark-shell --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

If you want to run the python version:

PYTHONPATH=$PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar \
$SPARK_HOME/bin/pyspark --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

Acknowledgements

Before TensorFlow released its Java API, this project was built on the great javacpp project, that implements the low-level bindings between TensorFlow and the Java virtual machine.

Many thanks to Google for the release of TensorFlow.

Comments
  •  java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    I build the jar by follow the readme, and then run it in pycharm https://www.dropbox.com/s/qmrs72l0p8p4bc2/Screen%20Shot%202016-07-06%20at%2011.40.26%20PM.png?dl=0 I add the self build jar as content root, I guess that's cause the error,

    line 11 is x = tfs.block(df, "x")

    code:

    import tensorflow as tf
    import tensorframes as tfs
    from pyspark.shell import sqlContext
    from pyspark.sql import Row
    
    data = [Row(x=float(x)) for x in range(10)]
    df = sqlContext.createDataFrame(data)
    
    with tf.Graph().as_default() as g:
        # The TensorFlow placeholder that corresponds to column 'x'.
        # The shape of the placeholder is automatically inferred from the DataFrame.
        x = tfs.block(df, "x")
        # The output that adds 3 to x
        z = tf.add(x, 3, name='z')
        # The resulting dataframe
        df2 = tfs.map_blocks(z, df)
    
    # The transform is lazy as for most DataFrame operations. This will trigger it:
    df2.collect()
    

    log

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    16/07/06 23:28:43 INFO SparkContext: Running Spark version 1.6.1
    16/07/06 23:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/07/06 23:28:44 INFO SecurityManager: Changing view acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: Changing modify acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(julian_qian); users with modify permissions: Set(julian_qian)
    16/07/06 23:28:44 INFO Utils: Successfully started service 'sparkDriver' on port 60597.
    16/07/06 23:28:45 INFO Slf4jLogger: Slf4jLogger started
    16/07/06 23:28:45 INFO Remoting: Starting remoting
    16/07/06 23:28:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60598]
    16/07/06 23:28:45 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60598.
    16/07/06 23:28:45 INFO SparkEnv: Registering MapOutputTracker
    16/07/06 23:28:45 INFO SparkEnv: Registering BlockManagerMaster
    16/07/06 23:28:45 INFO DiskBlockManager: Created local directory at /private/var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/blockmgr-5174cef3-29d9-4d2a-a84e-279a0e3d2f83
    16/07/06 23:28:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
    16/07/06 23:28:45 INFO SparkEnv: Registering OutputCommitCoordinator
    16/07/06 23:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    16/07/06 23:28:45 INFO Utils: Successfully started service 'SparkUI' on port 4041.
    16/07/06 23:28:45 INFO SparkUI: Started SparkUI at http://10.63.21.172:4041
    16/07/06 23:28:45 INFO Executor: Starting executor ID driver on host localhost
    16/07/06 23:28:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60599.
    16/07/06 23:28:45 INFO NettyBlockTransferService: Server created on 60599
    16/07/06 23:28:45 INFO BlockManagerMaster: Trying to register BlockManager
    16/07/06 23:28:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60599 with 511.1 MB RAM, BlockManagerId(driver, localhost, 60599)
    16/07/06 23:28:45 INFO BlockManagerMaster: Registered BlockManager
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
          /_/
    
    Using Python version 2.7.10 (default, Dec  1 2015 20:00:13)
    SparkContext available as sc, HiveContext available as sqlContext.
    16/07/06 23:28:46 INFO HiveContext: Initializing execution hive, version 1.2.1
    16/07/06 23:28:46 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:46 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:46 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:46 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:47 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:48 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:49 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:49 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:49 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:49 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62_resources
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62/_tmp_space.db
    16/07/06 23:28:49 INFO HiveContext: default warehouse location is /user/hive/warehouse
    16/07/06 23:28:49 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    16/07/06 23:28:49 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:49 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:50 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:50 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:51 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:52 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:52 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:52 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:52 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:52 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:52 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:53 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:53 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/77eb618d-61cc-470e-abb4-18d356833efb_resources
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb/_tmp_space.db
    

    error log:

    Traceback (most recent call last):
      File "/Users/julian_qian/PycharmProjects/tensorflow/tfs.py", line 11, in <module>
        x = tfs.block(df, "x")
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 315, in block
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 333, in _auto_placeholder
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 30, in _java_api
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o32.loadClass.
    : java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    
    opened by jq 13
  • Updated to tensorflow 1.6 and spark 2.3.

    Updated to tensorflow 1.6 and spark 2.3.

    Current version is not compatible with graphs generated by tf1.6 and it's preventing us from releasing dl-pipelines with tf1.6 support.

    • updated protobuf files and regenerated their java sources.
    • few minor changes related to Tensor taking a type parameter in tf1.6.
    opened by tomasatdatabricks 8
  • tensorframes is not working with variables.

    tensorframes is not working with variables.

    data = [Row(x=float(x)) for x in range(5)]
    df = sqlContext.createDataFrame(data)
    with tf.Graph().as_default() as g:
        # The placeholder that corresponds to column 'x'
        x = tf.placeholder(tf.double, shape=[None], name="x")
        # The output that adds 3 to x
        b = tf.Variable(float(3), name='a', dtype=tf.double)
        z = tf.add(x, b, name='z')
        #with or without `sess.run(tf.global_variables_initializer())`  following will fail
        
        df2 = tfs.map_blocks(z, df)
    
    df2.show()
    
    opened by yupbank 7
  • Does not work with Python3

    Does not work with Python3

    I just started using this with Python3, these are my commands run and the output messages.

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    Python 3.4.3 (default, Mar 26 2015, 22:03:40) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-1.5.2/assembly/target/scala-2.10/spark-assembly-1.5.2-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml databricks#tensorframes added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found databricks#tensorframes;0.2.3-s_2.10 in spark-packages found org.apache.commons#commons-lang3;3.4 in central :: resolution report :: resolve 98ms :: artifacts dl 4ms :: modules in use: databricks#tensorframes;0.2.3-s_2.10 from spark-packages in [default] org.apache.commons#commons-lang3;3.4 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/3ms) Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / .__/,// //_\ version 1.5.2 //

    Using Python version 3.4.3 (default, Mar 26 2015 22:03:40) SparkContext available as sc, SQLContext available as sqlContext.

    import tensorflow as tf import tensorframes as tfs

    Traceback (most recent call last): File "", line 1, in File "/tmp/spark-349c9955-ccd8-4fcd-938a-7e719fc45653/userFiles-bb935142-224f-4238-a144-f1cece7a5aa2/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/init.py", line 36, in ImportError: No module named 'core'

    opened by ushnish 6
  • Scala example does not work

    Scala example does not work

    I'm having trouble running the provided Scala example in the spark shell.

    My local environment is:

    • Spark 2.1.0
    • Scala version 2.11.8
    • Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121

    I ran the spark-shell with: spark-shell --packages databricks:tensorframes:0.2.5-rc2-s_2.11

    I get the following stacktrace which shuts down my spark process:

    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fff90451b52, pid=64869, tid=0x0000000000001c03
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 compressed oops)
    # Problematic frame:
    # C  [libsystem_c.dylib+0x1b52]  strlen+0x12
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /Users/ndrizard/projects/temps/hs_err_pid64869.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    Thanks for your help!

    opened by nicodri 5
  • Py4JError(

    Py4JError("Answer from Java side is empty") while testing

    I have been experimenting with TensorFrames from quite some days. I have spark-1.6.1 and openjdk7 installed on my ubuntu 14.04 64bit machine. I am using IPython notebook for testing.

    import tensorframes as tfs command is working perfectly fine, but when i do tfs.print_schema(df), where df is a dataframe. The below error pops recursively till max. depth is reached.

    ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File "/home/prakhar/utilities/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command raise Py4JError("Answer from Java side is empty") Py4JError: Answer from Java side is empty

    opened by prakhar21 4
  • [ML-7986] Update tensorflow to 1.14.0

    [ML-7986] Update tensorflow to 1.14.0

    • Update tensorflow version to 1.14.0 in environment.yml, project/Dependencies.scala, and python/requirements.txt
    • Auto update *.proto with the script. All of this type update comes from tensorflow.
    opened by lu-wang-dl 3
  • Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    • Drop support for Spark 2.1 and 2.2 and hence scala 2.10 and java 7
    • Update TF to 1.10 release
    • Remove nix files, which are not used
    • Update README

    We will support Spark 2.4 once RC is released.

    opened by mengxr 3
  • Usage of tf.contrib.distributions.percentile fails

    Usage of tf.contrib.distributions.percentile fails

    Consider the following dummy example using tf.contrib.distributions.percentile:

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    import tensorflow as tf
    import tensorframes as tfs
    from pyspark import SQLContext
    from pyspark.sql import Row
    from pyspark.sql.functions import *
    
    conf = SparkConf().setAppName("repro")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    
    data = [Row(x=[1.111, 0.516, 12.759]), Row(x=[2.222, 1.516, 13.759]), Row(x=[3.333, 2.516, 14.759]), Row(x=[4.444, 3.516, 15.759])]
    df = tfs.analyze(sqlContext.createDataFrame(data))
    
    with tf.Graph().as_default() as g:
    	x = tfs.block(df, "x")
    	q = tf.constant(90, 'float64', name='Percentile')
    	qntl = tf.contrib.distributions.percentile(x, q, axis=1)
    	result = tfs.map_blocks(x, df)
    	
    

    This fails with

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2752, in _as_graph_element_locked
        return op.outputs[out_n]
    IndexError: list index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 5, in <module>
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 152, in _map
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 83, in _add_shapes
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2880, in get_tensor_by_name
        return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2708, in as_graph_element
        return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2757, in _as_graph_element_locked
        % (repr(name), repr(op_name), len(op.outputs)))
    KeyError: "The name 'percentile/assert_integer/statically_determined_was_integer:0' refers to a Tensor which does not exist. The operation, 'percentile/assert_integer/statically_determined_was_integer', exists but only has 0 outputs."
    
    opened by martinstuder 3
  • Readme Example throwing Py4J error

    Readme Example throwing Py4J error

    I am using Spark 2.0.2, Python 2.7.12, iPython 5.1.0 on macOS 10.12.1.

    I am launching pyspark like this

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    From the demo, this block

    with tf.Graph().as_default() as g:
        x = tfs.block(df, "x")
        z = tf.add(x, 3, name='z')
        df2 = tfs.map_blocks(z, df)
    

    crashes with the following traceback:

    ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    <ipython-input-3-e7ae284146c3> in <module>()
          4     # The TensorFlow placeholder that corresponds to column 'x'.
          5     # The shape of the placeholder is automatically inferred from the DataFrame.
    ----> 6     x = tfs.block(df, "x")
          7     # The output that adds 3 to x
          8     z = tf.add(x, 3, name='z')
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in block(df, col_name, tf_name)
        313     :return: a TensorFlow placeholder.
        314     """
    --> 315     return _auto_placeholder(df, col_name, tf_name, block = True)
        316
        317 def row(df, col_name, tf_name = None):
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _auto_placeholder(df, col_name, tf_name, block)
        331
        332 def _auto_placeholder(df, col_name, tf_name, block):
    --> 333     info = _java_api().extra_schema_info(df._jdf)
        334     col_shape = [x.shape() for x in info if x.fieldName() == col_name]
        335     if len(col_shape) == 0:
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _java_api()
         28     # You cannot simply call the creation of the the class on the _jvm due to classloader issues
         29     # with Py4J.
    ---> 30     return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
         31         .newInstance()
         32
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args)
       1131         answer = self.gateway_client.send_command(command)
       1132         return_value = get_return_value(
    -> 1133             answer, self.gateway_client, self.target_id, self.name)
       1134
       1135         for temp_arg in temp_args:
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
         61     def deco(*a, **kw):
         62         try:
    ---> 63             return f(*a, **kw)
         64         except py4j.protocol.Py4JJavaError as e:
         65             s = e.java_exception.toString()
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
        317                 raise Py4JJavaError(
        318                     "An error occurred while calling {0}{1}{2}.\n".
    --> 319                     format(target_id, ".", name), value)
        320             else:
        321                 raise Py4JError(
    
    Py4JJavaError: An error occurred while calling o47.loadClass.
    : java.lang.NoClassDefFoundError: org/apache/spark/Logging
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:280)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	... 22 more
    
    opened by damienstanton 3
  • Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    The subject says it all.

    WARNING: doit works (since it disabled tests in assembly), but I could not get sbt test working. It fails with the following error which is more about TensorFlow that I know nothing about:

    ➜  tensorframes git:(spark-200-and-other-upgrades) sbt
    [info] Loading global plugins from /Users/jacek/.sbt/0.13/plugins
    [info] Loading project definition from /Users/jacek/dev/oss/tensorframes/project
    [info] Set current project to tensorframes (in build file:/Users/jacek/dev/oss/tensorframes/)
    > testOnly org.tensorframes.dsl.BasicOpsSuite
    16/08/04 23:52:22 DEBUG Paths$: Request for x -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for y -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for z -> 0
    
    import tensorflow as tf
    
    x = tf.constant(1, name='x')
    y = tf.constant(2, name='y')
    z = tf.add(x, y, name='z')
    
    g = tf.get_default_graph().as_graph_def()
    for n in g.node:
        print ">>>>>", str(n.name), "<<<<<<"
        print n
    
    [info] BasicOpsSuite:
    [info] - Add *** FAILED ***
    [info]   1 did not equal 0 (1,===========
    [info]   
    [info]   import tensorflow as tf
    [info]   
    [info]   x = tf.constant(1, name='x')
    [info]   y = tf.constant(2, name='y')
    [info]   z = tf.add(x, y, name='z')
    [info]         
    [info]   g = tf.get_default_graph().as_graph_def()
    [info]   for n in g.node:
    [info]       print ">>>>>", str(n.name), "<<<<<<"
    [info]       print n
    [info]          
    [info]   ===========) (ExtractNodes.scala:40)
    [info] Run completed in 1 second, 772 milliseconds.
    [info] Total number of tests run: 1
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
    [info] *** 1 TEST FAILED ***
    [error] Failed tests:
    [error]         org.tensorframes.dsl.BasicOpsSuite
    [error] (test:testOnly) sbt.TestsFailedException: Tests unsuccessful
    [error] Total time: 2 s, completed Aug 4, 2016 11:52:22 PM
    

    I'm proposing the PR hoping the issue is a minor one that could easily be fixed with enough guidance.

    opened by jaceklaskowski 3
  • Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bumps tensorflow from 1.15.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Support with deep learning model plugging

    Support with deep learning model plugging

    Can you guys help to plug this https://github.com/hongzimao/decima-sim deep learning model into tensorforms? Is it possible to do, any help will be highly appreciated.

    opened by jahidhasanlinix 0
  • Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Hi, I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.

    I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:

    %pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl

    But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?

    I would really appreciate the help.

    opened by samvygupta 0
  • Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Hi I want to use DeepImageFeaturizer combined with spark ML Logistic regression in Spark (2.4.5) / scala 2.11.12 but it's not working. I'm trying to resolve it for many days.

    I have this issue : java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;

    It seems a library is missing but i think I've already referenced all the needed ones :

    delta-core_2.11-0.6.0.jar
    libtensorflow-1.15.0.jar
    libtensorflow_jni-1.15.0.jar
    libtensorflow_jni_gpu-1.15.0.jar
    proto-1.15.0.jar
    scala-logging-api_2.11-2.1.2.jar
    scala-logging-slf4j_2.11-2.1.2.jar
    scala-logging_2.11-3.9.2.jar
    spark-deep-learning-1.5.0-spark2.4-s_2.11.jar
    spark-sql-kafka-0-10_2.11-2.4.5.jar
    spark-tensorflow-connector_2.11-1.6.0.jar
    tensorflow-1.15.0.jar
    tensorflow-hadoop-1.15.0.jar
    tensorframes-0.8.2-s_2.11.jar
    

    Full trace :

    20/05/15 21:17:28 DEBUG impl.TensorFlowOps$: Outputs: Set(InceptionV3_sparkdl_output__)
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
    Caused by: java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;
    	at org.tensorframes.impl.TensorFlowOps$.graphSerial(TensorFlowOps.scala:69)
    	at org.tensorframes.impl.TensorFlowOps$.analyzeGraphTF(TensorFlowOps.scala:114)
    	at org.tensorframes.impl.DebugRowOps.mapRows(DebugRowOps.scala:408)
    	at com.databricks.sparkdl.DeepImageFeaturizer.transform(DeepImageFeaturizer.scala:135)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:161)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    	at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
    	at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
    	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
    

    Can someone of the team can tell me what is going wrong ? thanks for your support

    opened by eleite77 0
  • Could not initialize class org.tensorframes.impl.SupportedOperations

    Could not initialize class org.tensorframes.impl.SupportedOperations

    Py4JJavaError: An error occurred while calling o162.analyze. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 10.244.31.75, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.tensorframes.ExtraOperations$.deepAnalyzeDataFrame(ExperimentalOperations.scala:113) at org.tensorframes.ExperimentalOperations$class.analyze(ExperimentalOperations.scala:41) at org.tensorframes.impl.DebugRowOps.analyze(DebugRowOps.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

    opened by lee2015new 2
Releases(v0.6.0)
  • v0.6.0(Nov 16, 2018)

  • v0.5.0(Aug 21, 2018)

  • v0.4.0(Jun 18, 2018)

  • v0.2.9(Sep 13, 2017)

    This is the final release for 0.2.9.

    Notable changes since 0.2.8:

    • Upgrades tensorflow dependency from version 1.1.0 to 1.3.0
    • map_blocks, map_row APIs now accept Pandas DataFrames as input
    • Adds support for tensorflow variables. Note that these variables cannot be shared between the worker nodes.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Apr 25, 2017)

    This is the final release for 0.2.8.

    Notable changes since 0.2.5:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for multiple hardware platforms (CPU, GPU) and operating systems (linux, macos). Windows should also work but it has not been tested.
    • support for Spark 2.1.x and Spark 2.2.x
    • some usability and performance fixes, which should give a better experience for users
    • more flexible input names for mapRows.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8-rc0(Apr 24, 2017)

    This is the first release candidate for 0.2.8.

    Notable changes:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for Spark 2.1.x
    • the same release should support both CPU and GPU clusters
    • some usability and performance fixes, which should give a better experience for users
    Source code(tar.gz)
    Source code(zip)
Owner
Databricks
Helping data teams solve the world’s toughest problems using data and AI
Databricks
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

Karinne Cristina 54 Nov 06, 2022
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft 366 Jan 03, 2023
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art

Louis-François Bouchard 3.3k Jan 04, 2023
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022
About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

天河 32 Oct 25, 2022
Apache (Py)Spark type annotations (stub files).

PySpark Stubs A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints. T

Maciej 114 Nov 22, 2022
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
This repository contains the code to predict house price using Linear Regression Method

House-Price-Prediction-Using-Linear-Regression The dataset I used for this personal project is from Kaggle uploaded by aariyan panchal. Link of Datase

0 Jan 28, 2022
Bayesian Additive Regression Trees For Python

BartPy Introduction BartPy is a pure python implementation of the Bayesian additive regressions trees model of Chipman et al [1]. Reasons to use BART

187 Dec 16, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

1 Oct 28, 2021
使用数学和计算机知识投机倒把

偷鸡不成项目集锦 坦率地讲,涉及金融市场的好策略如果公开,必然导致使用的人多,最后策略变差。所以这个仓库只收集我目前失败了的案例。 加密货币组合套利 中国体育彩票预测 我赚不上钱的项目,也许可以帮助更有能力的人去赚钱。

Roy 28 Dec 29, 2022
distfit - Probability density fitting

Python package for probability density function fitting of univariate distributions of non-censored data

Erdogan Taskesen 187 Dec 30, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
Python library for multilinear algebra and tensor factorizations

scikit-tensor is a Python module for multilinear algebra and tensor factorizations

Maximilian Nickel 394 Dec 09, 2022
Adaptive: parallel active learning of mathematical functions

adaptive Adaptive: parallel active learning of mathematical functions. adaptive is an open-source Python library designed to make adaptive parallel fu

741 Dec 27, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 03, 2022