Selected Reading
UPSC IAS Exams Notes
Developer's Best Practices
Questions and Answers
Effective Resume Writing
HR Interview Questions
Computer Glossary
Who is Who

Spark SQL - Data Sources

A DataFrame interface allows different DataSources to work on Spark SQL. It is a temporary table and can be operated as a normal RDD. Registering a DataFrame as a table allows you to run SQL queries over its data.

In this chapter, we will describe the general methods for loading and saving data using different Spark DataSources. Thereafter, we will discuss in detail the specific options that are available for the built-in data sources.

There are different types of data sources available in SparkSQL, some of which are listed below −

Sr. No	Data Sources
1	JSON Datasets Spark SQL can automatically capture the schema of a JSON dataset and load it as a DataFrame.
2	Hive Tables Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext.
3	Parquet Files Parquet is a columnar format, supported by many data processing systems.

Print Page