Extracting Data from NoSQL Databases: A Step towards Interactive Visual Analysis of NoSQL Data

Categories:

Recommended

Introduction

In this chapter, the main topic of the thesis and the motivation behind it are introduced. This starts off with a presentation of the problem background in Section 1.1. Following this, the problem formulation given in Section 1.2. Then, the main contributions of the thesis are presented in Section 1.3. Finally, an outline of the rest of the thesis is given in Section 1.4.

1.1 RDBMSs, NoSQL and Spotfire

Codd presented already in 1970 his relational model for data stores. Ever since, his model has been widely adopted within the IT industry as the default data model for databases. Traditionally, the major database management systems (DBMSs) have been based on this model – resulting in so called relational DBMSs (RDBMSs). Examples of such include MySQL1 , Oracle Database2 and Microsoft SQL Server3 .

In recent years the requirements for many modern applications have significantly changed – especially since the rise of the Web in the late 90s and early 00s. Large websites must now be able to serve billions of pages every day and web users expect that their data is ubiquitously accessible at the speed of light no matter the time of day. Some of these requirements have made RDBMSs as data stores unsatisfactory in several ways. Issues include that throughput is too low, that they do not scale well and that the relational model simply does not map well to some applications.

As a reaction to this, new types of DBMSs under the umbrella term NoSQL have become popular. The term NoSQL is often interpreted as short for “Not only SQL”, where SQL refers to the default data management language in RDBMSs – Structured Query Language. The whole purpose of this movement is to provide alternatives where RDBMSs are a bad fit. The term incorporates a wide range of different systems. In general, NoSQL databases use non-relational data models, lack schema definitions and scale horizontally.

Businesses and organizations today generate increasing volumes of data including information about their customers, suppliers, competitors and opera tions. Being able to analyze and visualize this data to find trends and anomalies that subsequently can be acted on to create economic value has become an important factor for competitive advantage – a factor whose importance most likely will increase even more the coming years.

Spotfire is a software platform for interactive data analysis and visualization, developed by the company TIBCO Software. Spotfire enables businesses and organizations to understand their data so that they can make informed decisions which in turn create economic value.

While Spotfire includes tools analyzing and visualizing data, the data itself is supplied by the user. The different systems from which one can import data into Spotfire is a key competition point for the product. Today, Spotfire supports importing data from several RDBMSs, spreadsheets and CSV4 -like formats. However, there is currently no support for any NoSQL alternative.

The Spotfire platform data model is based on the relational model; data is represented by rectangular tables consisting of rows and columns. The relationship to RDBMSs is natural and thus automatic extraction and import from such data sources into Spotfire tables is often a simple task. Because of the non-relational data models and lack of explicitly defined schemas in NoSQL databases, the relationship to Spotfire tables and how to extract data is not always as obvious.

 

Category:

Attribution

Petter Nasholm. Extracting Data from NoSQL Databases: A Step towards Interactive Visual Analysis of NoSQL Data. http://publications.lib.chalmers.se/records/fulltext/155048.pdf

VP Flipbook Maker

Convert your work to digital flipbook with VP Online Flipbook Maker! You can also create a new one with the tool. Try it now!