public static Column unix_timestamp(Column s)
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.
Parameters:
s - (undocumented)
Returns:
(undocumented)
Since:
1.5.0
from pyspark.sql import functions as F
df.withColumn('startuptime_stamp', F.unix_timestamp('startuptime')) 使用HiveSQL
select device_id, max(startuptime) as max_startuptime, min(startuptime) as min_startuptime from app_table group by device_id Spark处理数据存储到Hive的方式
alter table app_table drop if exists partition(datestr='$day_01');
load data inpath 'hdfs://xx/out/$day_01' overwrite into table app_table partition(datestr='$day_01'); hivectx.sql & insert
app_table1_df.registerTempTable("app_table1_tmp")
app_table2_df.registerTempTable("app_table2_tmp")
hivectx.sql("set spark.sql.shuffle.partitions=1")
hivectx.sql("alter table app_table drop if exists partition(datestr='%s')" % daystr)
hivectx.sql("insert overwrite table app_table partition(datestr='%s') select * from app_table1_tmp" % daystr)
hivectx.sql("insert into app_table partition(datestr='%s') select * from app_table2_tmp" % daystr) Spark处理新增列的方式map和udf、functions