Analyzing and Visualizing Data with F#

Categories:

Recommended

Working with data was not always as easy as nowadays. For example, processing the data from the decennial 1880 US Census took eight years. For the 1890 census, the United States Census Bureau hired Herman Hollerith, who invented a number of devices to auto-mate the process. A pantograph punch was used to punch the data on punch cards, which were then fed to the tabulator that counted cards with certain properties, or to the sorter for filtering. The cen‐ sus still required a large amount of clerical work, but Hollerith’s machines sped up the process eight times to just one year.

These days, filtering and calculating sums over hundreds of millions of rows (the number of forms received in the 2010 US Census) can take seconds. Much of the data from the US Census, various Open Government Data initiatives, and from international organizations like the World Bank is available online and can be analyzed by anyone. Hollerith’s tabulator and sorter have become standard library functions in many programming languages and data analytics libraries.

Making data analytics easier no longer involves building new physical devices, but instead involves creating better software tools and programming languages. So, let’s see how the F# language and its unique features like type providers make the task of modern data analysis even easier!

Data Science Work€ow

Data science is an umbrella term for a wide range of fields and disciplines that are needed to extract knowledge from data. The typical data science workflow is an iterative process. You start with an initial idea or research question, get some data, do a quick analysis, and make a visualization to show the results. This shapes your original idea, so you can go back and adapt your code. On the technical side, the three steps include a number of activities:

• Accessing data. The first step involves connecting to various data sources, downloading CSV files, or calling REST services. Then we need to combine data from different sources, align the data correctly, clean possible errors, and fill in missing values.

• Analyzing data. Once we have the data, we can calculate basic statistics about it, run machine learning algorithms, or write our own algorithms that help us explain what the data means.

• Visualizing data. Finally, we need to present the results. We may build a chart, create interactive visualization that can be published, or write a report that represents the results of our analysis.

If you ask any data scientist, she’ll tell you that accessing data is the most frustrating part of the workflow. You need to download CSV files, figure out what columns contain what values, then determine how missing values are represented and parse them. When calling REST-based services, you need to understand the structure of the returned JSON and extract the values you care about. As you’ll see in this chapter, the data access part is largely simplified in F# thanks to type providers that integrate external data sources directly into the language.

Category:

Attribution

Tomas Petricek. Analyzing and Visualizing Data with F#. https://web.archive.org/web/20201023042804/https://www.oreilly.com/programming/free/files/analyzing-visualizing-data-f-sharp.pdf

VP Flipbook Maker

Convert your work to digital flipbook with VP Online Flipbook Maker! You can also create a new one with the tool. Try it now!