Normal view MARC view ISBD view

Practical Big Data Analytics : (Record no. 1629)

MARC details
000 -LEADER
fixed length control field	09726nam a22005293i 4500
001 - CONTROL NUMBER
control field	EBC5254586
003 - CONTROL NUMBER IDENTIFIER
control field	PCN
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20231129142200.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS
fixed length control field	m o d \|
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	cr cnu\|\|\|\|\|\|\|\|
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	231129s2018 xx o \|\|\|\|0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781783554409
Qualifying information	(electronic bk.)

Canceled/invalid ISBN	9781783554393
035 ## - SYSTEM CONTROL NUMBER
System control number	(MiAaPQ)EBC5254586

System control number	(Au-PeEL)EBL5254586

System control number	(CaPaEBR)ebr11505138

System control number	(OCoLC)1021887799
040 ## - CATALOGING SOURCE
Original cataloging agency	MiAaPQ
Language of cataloging	eng
Description conventions	rda
--	pn
Transcribing agency	MiAaPQ
Modifying agency	MiAaPQ
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name	Dasgupta, Nataraj.
245 10 - TITLE STATEMENT
Title	Practical Big Data Analytics :
Remainder of title	Hands-On Techniques to Implement Enterprise Analytics and Machine Learning Using Hadoop, Spark, NoSQL and R /
Statement of responsibility, etc.	Nataraj Dasgupta, Nataraj Dasgupta, Giancarlo Zaccone, and Patrick Hannah
250 ## - EDITION STATEMENT
Edition statement	1st ed.
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	Birmingham :
Name of producer, publisher, distributor, manufacturer	Packt Publishing, Limited,
Date of production, publication, distribution, manufacture, or copyright notice	2018.

Date of production, publication, distribution, manufacture, or copyright notice	©2018.
300 ## - PHYSICAL DESCRIPTION
Extent	1 online resource (402 pages)
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent
337 ## - MEDIA TYPE
Media type term	computer
Media type code	c
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	online resource
Carrier type code	cr
Source	rdacarrier
505 0# - FORMATTED CONTENTS NOTE
Formatted contents note	Cover -- Copyright and Credits -- Packt Upsell -- Contributors -- Table of Contents -- Preface -- Chapter 1: Too Big or Not Too Big -- What is big data? -- A brief history of data -- Dawn of the information age -- Dr. Alan Turing and modern computing -- The advent of the stored-program computer -- From magnetic devices to SSDs -- Why we are talking about big data now if data has always existed -- Definition of big data -- Building blocks of big data analytics -- Types of Big Data -- Structured -- Unstructured -- Semi-structured -- Sources of big data -- The 4Vs of big data -- When do you know you have a big data problem and where do you start your search for the big data solution? -- Summary -- Chapter 2: Big Data Mining for the Masses -- What is big data mining? -- Big data mining in the enterprise -- Building the case for a Big Data strategy -- Implementation life cycle -- Stakeholders of the solution -- Implementing the solution -- Technical elements of the big data platform -- Selection of the hardware stack -- Selection of the software stack -- Summary -- Chapter 3: The Analytics Toolkit -- Components of the Analytics Toolkit -- System recommendations -- Installing on a laptop or workstation -- Installing on the cloud -- Installing Hadoop -- Installing Oracle VirtualBox -- Installing CDH in other environments -- Installing Packt Data Science Box -- Installing Spark -- Installing R -- Steps for downloading and installing Microsoft R Open -- Installing RStudio -- Installing Python -- Summary -- Chapter 4: Big Data With Hadoop -- The fundamentals of Hadoop -- The fundamental premise of Hadoop -- The core modules of Hadoop -- Hadoop Distributed File System - HDFS -- Data storage process in HDFS -- Hadoop MapReduce -- An intuitive introduction to MapReduce -- A technical understanding of MapReduce -- Block size and number of mappers and reducers.

Formatted contents note	Hadoop YARN -- Job scheduling in YARN -- Other topics in Hadoop -- Encryption -- User authentication -- Hadoop data storage formats -- New features expected in Hadoop 3 -- The Hadoop ecosystem -- Hands-on with CDH -- WordCount using Hadoop MapReduce -- Analyzing oil import prices with Hive -- Joining tables in Hive -- Summary -- Chapter 5: Big Data Mining with NoSQL -- Why NoSQL? -- The ACID, BASE, and CAP properties -- ACID and SQL -- The BASE property of NoSQL -- The CAP theorem -- The need for NoSQL technologies -- Google Bigtable -- Amazon Dynamo -- NoSQL databases -- In-memory databases -- Columnar databases -- Document-oriented databases -- Key-value databases -- Graph databases -- Other NoSQL types and summary of other types of databases -- Analyzing Nobel Laureates data with MongoDB -- JSON format -- Installing and using MongoDB -- Tracking physician payments with real-world data -- Installing kdb+, R, and RStudio -- Installing kdb+ -- Installing R -- Installing RStudio -- The CMS Open Payments Portal -- Downloading the CMS Open Payments data -- Creating the Q application -- Loading the data -- The backend code -- Creating the frontend web portal -- R Shiny platform for developers -- Putting it all together - The CMS Open Payments application -- Applications -- Summary -- Chapter 6: Spark for Big Data Analytics -- The advent of Spark -- Limitations of Hadoop -- Overcoming the limitations of Hadoop -- Theoretical concepts in Spark -- Resilient distributed datasets -- Directed acyclic graphs -- SparkContext -- Spark DataFrames -- Actions and transformations -- Spark deployment options -- Spark APIs -- Core components in Spark -- Spark Core -- Spark SQL -- Spark Streaming -- GraphX -- MLlib -- The architecture of Spark -- Spark solutions -- Spark practicals -- Signing up for Databricks Community Edition.

Formatted contents note	Spark exercise - hands-on with Spark (Databricks) -- Summary -- Chapter 7: An Introduction to Machine Learning Concepts -- What is machine learning? -- The evolution of machine learning -- Factors that led to the success of machine learning -- Machine learning, statistics, and AI -- Categories of machine learning -- Supervised and unsupervised machine learning -- Supervised machine learning -- Vehicle Mileage, Number Recognition and other examples -- Unsupervised machine learning -- Subdividing supervised machine learning -- Common terminologies in machine learning -- The core concepts in machine learning -- Data management steps in machine learning -- Pre-processing and feature selection techniques -- Centering and scaling -- The near-zero variance function -- Removing correlated variables -- Other common data transformations -- Data sampling -- Data imputation -- The importance of variables -- The train, test splits, and cross-validation concepts -- Splitting the data into train and test sets -- The cross-validation parameter -- Creating the model -- Leveraging multicore processing in the model -- Summary -- Chapter 8: Machine Learning Deep Dive -- The bias, variance, and regularization properties -- The gradient descent and VC Dimension theories -- Popular machine learning algorithms -- Regression models -- Association rules -- Confidence -- Support -- Lift -- Decision trees -- The Random forest extension -- Boosting algorithms -- Support vector machines -- The K-Means machine learning technique -- The neural networks related algorithms -- Tutorial - associative rules mining with CMS data -- Downloading the data -- Writing the R code for Apriori -- Shiny (R Code) -- Using custom CSS and fonts for the application -- Running the application -- Summary -- Chapter 9: Enterprise Data Science -- Enterprise data science overview.

Formatted contents note	A roadmap to enterprise analytics success -- Data science solutions in the enterprise -- Enterprise data warehouse and data mining -- Traditional data warehouse systems -- Oracle Exadata, Exalytics, and TimesTen -- HP Vertica -- Teradata -- IBM data warehouse systems (formerly Netezza appliances) -- PostgreSQL -- Greenplum -- SAP Hana -- Enterprise and open source NoSQL Databases -- Kdb+ -- MongoDB -- Cassandra -- Neo4j -- Cloud databases -- Amazon Redshift, Redshift Spectrum, and Athena databases -- Google BigQuery and other cloud services -- Azure CosmosDB -- GPU databases -- Brytlyt -- MapD -- Other common databases -- Enterprise data science - machine learning and AI -- The R programming language -- Python -- OpenCV, Caffe, and others -- Spark -- Deep learning -- H2O and Driverless AI -- Datarobot -- Command-line tools -- Apache MADlib -- Machine learning as a service -- Enterprise infrastructure solutions -- Cloud computing -- Virtualization -- Containers - Docker, Kubernetes, and Mesos -- On-premises hardware -- Enterprise Big Data -- Tutorial - using RStudio in the cloud -- Summary -- Chapter 10: Closing Thoughts on Big Data -- Corporate big data and data science strategy -- Ethical considerations -- Silicon Valley and data science -- The human factor -- Characteristics of successful projects -- Summary -- Appendix: External Data Science Resources -- Big data resources -- NoSQL products -- Languages and tools -- Creating dashboards -- Notebooks -- Visualization libraries -- Courses on R -- Courses on machine learning -- Machine learning and deep learning links -- Web-based machine learning services -- Movies -- Machine learning books from Packt -- Books for leisure reading -- Other Books You May Enjoy -- Leave a review - let other readers know what you think -- Index.
520 ## - SUMMARY, ETC.
Summary, etc.	Big Data analytics relates to the strategies used by enterprises to process and analyze large amounts of data to bring out hidden insights. With the help of open source and enterprise tools, such as R, Python, Hadoop, and Spark, you will learn how to effectively mine your Big Data. By the end of this book, you will have a clear understanding.
588 ## - SOURCE OF DESCRIPTION NOTE
Source of description note	Description based on publisher supplied metadata and other sources.
590 ## - LOCAL NOTE (RLIN)
Local note	Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2023. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Machine learning

Topical term or geographic name entry element	Cloud computing

Topical term or geographic name entry element	Big data

Topical term or geographic name entry element	Data mining
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term	Electronic books.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Dasgupta, Nataraj.

Personal name	Zaccone, Giancarlo.

Personal name	Hannah, Patrick.
776 08 - ADDITIONAL PHYSICAL FORM ENTRY
Relationship information	Print version:
Main entry heading	Dasgupta, Nataraj
Title	Practical Big Data Analytics
Place, publisher, and date of publication	Birmingham : Packt Publishing, Limited,c2018
International Standard Book Number	9781783554393
797 2# - LOCAL ADDED ENTRY--CORPORATE NAME (RLIN)
Corporate name or jurisdiction name as entry element	ProQuest (Firm)
856 40 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	<a href="https://ebookcentral.proquest.com/lib/alcm/detail.action?docID=5254586">https://ebookcentral.proquest.com/lib/alcm/detail.action?docID=5254586</a>
Public note	Click to View (ProQuest Ebook Central)
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	E-Book

No items available.