pyspark.sql.
DataFrameReader
Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.read to access this.
DataFrame
SparkSession.read
New in version 1.4.
Methods
csv(path[, schema, sep, encoding, quote, …])
csv
Loads a CSV file and returns the result as a DataFrame.
format(source)
format
Specifies the input data source format.
jdbc(url, table[, column, lowerBound, …])
jdbc
Construct a DataFrame representing the database table named table accessible via JDBC URL url and connection properties.
table
url
properties
json(path[, schema, primitivesAsString, …])
json
Loads JSON files and returns the results as a DataFrame.
load([path, format, schema])
load
Loads data from a data source and returns it as a DataFrame.
option(key, value)
option
Adds an input option for the underlying data source.
options(**options)
options
Adds input options for the underlying data source.
orc(path[, mergeSchema, pathGlobFilter, …])
orc
Loads ORC files, returning the result as a DataFrame.
parquet(*paths, **options)
parquet
Loads Parquet files, returning the result as a DataFrame.
schema(schema)
schema
Specifies the input schema.
table(tableName)
Returns the specified table as a DataFrame.
text(paths[, wholetext, lineSep, …])
text
Loads text files and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.