Author: Jordan Tigani
Publisher: John Wiley & Sons
Release Date: 2014-05-21
How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python
Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. Data Just Right is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist
Big Data Application Architecture Pattern Recipes provides an insight into heterogeneous infrastructures, databases, and visualization and analytics tools used for realizing the architectures of big data solutions. Its problem-solution approach helps in selecting the right architecture to solve the problem at hand. In the process of reading through these problems, you will learn harness the power of new big data opportunities which various enterprises use to attain real-time profits. Big Data Application Architecture Pattern Recipes answers one of the most critical questions of this time 'how do you select the best end-to-end architecture to solve your big data problem?'. The book deals with various mission critical problems encountered by solution architects, consultants, and software architects while dealing with the myriad options available for implementing a typical solution, trying to extract insight from huge volumes of data in real–time and across multiple relational and non-relational data types for clients from industries like retail, telecommunication, banking, and insurance. The patterns in this book provide the strong architectural foundation required to launch your next big data application. The architectures for realizing these opportunities are based on relatively less expensive and heterogeneous infrastructures compared to the traditional monolithic and hugely expensive options that exist currently. This book describes and evaluates the benefits of heterogeneity which brings with it multiple options of solving the same problem, evaluation of trade-offs and validation of 'fitness-for-purpose' of the solution.
We are living in the dawn of what has been termed as the "Fourth Industrial Revolution," which is marked through the emergence of "cyber-physical systems" where software interfaces seamlessly over networks with physical systems, such as sensors, smartphones, vehicles, power grids or buildings, to create a new world of Internet of Things (IoT). Data and information are fuel of this new age where powerful analytics algorithms burn this fuel to generate decisions that are expected to create a smarter and more efficient world for all of us to live in. This new area of technology has been defined as Big Data Science and Analytics, and the industrial and academic communities are realizing this as a competitive technology that can generate significant new wealth and opportunity. Big data is defined as collections of datasets whose volume, velocity or variety is so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools. Big data science and analytics deals with collection, storage, processing and analysis of massive-scale data. Industry surveys, by Gartner and e-Skills, for instance, predict that there will be over 2 million job openings for engineers and scientists trained in the area of data science and analytics alone, and that the job market is in this area is growing at a 150 percent year-over-year growth rate. We have written this textbook, as part of our expanding "A Hands-On Approach"(TM) series, to meet this need at colleges and universities, and also for big data service providers who may be interested in offering a broader perspective of this emerging field to accompany their customer and developer training programs. The typical reader is expected to have completed a couple of courses in programming using traditional high-level languages at the college-level, and is either a senior or a beginning graduate student in one of the science, technology, engineering or mathematics (STEM) fields. An accompanying website for this book contains additional support for instruction and learning (www.big-data-analytics-book.com) The book is organized into three main parts, comprising a total of twelve chapters. Part I provides an introduction to big data, applications of big data, and big data science and analytics patterns and architectures. A novel data science and analytics application system design methodology is proposed and its realization through use of open-source big data frameworks is described. This methodology describes big data analytics applications as realization of the proposed Alpha, Beta, Gamma and Delta models, that comprise tools and frameworks for collecting and ingesting data from various sources into the big data analytics infrastructure, distributed filesystems and non-relational (NoSQL) databases for data storage, and processing frameworks for batch and real-time analytics. This new methodology forms the pedagogical foundation of this book. Part II introduces the reader to various tools and frameworks for big data analytics, and the architectural and programming aspects of these frameworks, with examples in Python. We describe Publish-Subscribe messaging frameworks (Kafka & Kinesis), Source-Sink connectors (Flume), Database Connectors (Sqoop), Messaging Queues (RabbitMQ, ZeroMQ, RestMQ, Amazon SQS) and custom REST, WebSocket and MQTT-based connectors. The reader is introduced to data storage, batch and real-time analysis, and interactive querying frameworks including HDFS, Hadoop, MapReduce, YARN, Pig, Oozie, Spark, Solr, HBase, Storm, Spark Streaming, Spark SQL, Hive, Amazon Redshift and Google BigQuery. Also described are serving databases (MySQL, Amazon DynamoDB, Cassandra, MongoDB) and the Django Python web framework. Part III introduces the reader to various machine learning algorithms with examples using the Spark MLlib and H2O frameworks, and visualizations using frameworks such as Lightning, Pygal and Seaborn.
Big data is currently one of the most critical emerging technologies. Organizations around the world are looking to exploit the explosive growth of data to unlock previously hidden insights in the hope of creating new revenue streams, gaining operational efficiencies, and obtaining greater understanding of customer needs. It is important to think of big data and analytics together. Big data is the term used to describe the recent explosion of different types of data from disparate sources. Analytics is about examining data to derive interesting and relevant trends and patterns, which can be used to inform decisions, optimize processes, and even drive new business models. With today's deluge of data comes the problems of processing that data, obtaining the correct skills to manage and analyze that data, and establishing rules to govern the data's use and distribution. The big data technology stack is ever growing and sometimes confusing, even more so when we add the complexities of setting up big data environments with large up-front investments. Cloud computing seems to be a perfect vehicle for hosting big data workloads. However, working on big data in the cloud brings its own challenge of reconciling two contradictory design principles. Cloud computing is based on the concepts of consolidation and resource pooling, but big data systems (such as Hadoop) are built on the shared nothing principle, where each node is independent and self-sufficient. A solution architecture that can allow these mutually exclusive principles to coexist is required to truly exploit the elasticity and ease-of-use of cloud computing for big data environments. This IBM® RedpaperTM publication is aimed at chief architects, line-of-business executives, and CIOs to provide an understanding of the cloud-related challenges they face and give prescriptive guidance for how to realize the benefits of big data solutions quickly and cost-effectively.
Author: Feras Alhlou
Publisher: John Wiley & Sons
Release Date: 2016-09-06
Genre: Business & Economics
A complete, start-to-finish guide to Google Analytics instrumentation and reporting Google Analytics Breakthrough is a much-needed comprehensive resource for the world's most widely adopted analytics tool. Designed to provide a complete, best-practices foundation in measurement strategy, implementation, reporting, and optimization, this book systematically demystifies the broad range of Google Analytics features and configurations. Throughout the end-to-end learning experience, you'll sharpen your core competencies, discover hidden functionality, learn to avoid common pitfalls, and develop next-generation tracking and analysis strategies so you can understand what is helping or hindering your digital performance and begin driving more success. Google Analytics Breakthrough offers practical instruction and expert perspectives on the full range of implementation and reporting skills: Learn how to campaign-tag inbound links to uncover the email, social, PPC, and banner/remarketing traffic hiding as other traffic sources and to confidently measure the ROI of each marketing channel Add event tracking to capture the many important user interactions that Google Analytics does not record by default, such as video plays, PDF downloads, scrolling, and AJAX updates Master Google Tag Manager for greater flexibility and process control in implementation Set up goals and Enhanced Ecommerce tracking to measure performance against organizational KPIs and configure conversion funnels to isolate drop-off Create audience segments that map to your audience constituencies, amplify trends, and help identify optimization opportunities Populate custom dimensions that reflect your organization, your content, and your visitors so Google Analytics can speak your language Gain a more complete view of customer behavior with mobile app and cross-device tracking Incorporate related tools and techniques: third-party data visualization, CRM integration for long-term value and lead qualification, marketing automation, phone conversion tracking, usability, and A/B testing Improve data storytelling and foster analytics adoption in the enterprise Millions of organizations have installed Google Analytics, including an estimated 67 percent of Fortune 500 companies, but deficiencies plague most implementations, and inadequate reporting practices continue to hinder meaningful analysis. By following the strategies and techniques in Google Analytics Breakthrough, you can address the gaps in your own still set, transcend the common limitations, and begin using Google Analytics for real competitive advantage. Critical contributions from industry luminaries such as Brian Clifton, Tim Ash, Bryan and Jeffrey Eisenberg, and Jim Sterne – and a foreword by Avinash Kaushik – enhance the learning experience and empower you to drive consistent, real-world improvement through analytics.
Author: Jonathan Weber
Publisher: Novatec Editora
Release Date: 2016-05-05
Genre: Business & Economics
Quer você seja um profissional de marketing com habilidades de desenvolvimento ou um analista/desenvolvedor web pleno, este livro mostra como implementar o Google Analytics usando o Google Tag Manager para alavancar seu trabalho de web analytics. Quer você esteja começando do zero em um novo site, quer esteja fazendo a reengenharia ou aprimorando uma conta do Google Analytics que você herdou, este livro fornece as ferramentas de que você precisa. Há uma razão para tantas organizações usarem o Google Analytics. A coleta efetiva de dados de web analytics por meio do Google Analytics pode reduzir os custos de aquisição de clientes, converter visitantes em clientes, fornecer feedback valioso sobre novas iniciativas de produtos e oferecer ideias que vão fazer crescer sua base de clientes. Então, como o Google Tag Manager se enquadra nisso? Com uma lista crescente de recursos e a rápida adoção em todos os setores, o Google Tag Manager permite a colaboração sem precedentes entre marketing e equipes técnicas, atualizações relâmpago de seu site e a padronização das tags mais comuns para os esforços internos da empresa em rastreamento e marketing. Este livro mostra que, para conseguir os dados ricos que você está realmente buscando a fim de melhor atender às necessidades dos seus usuários, você precisa das ferramentas que o Google Tag Manager fornece para uma implementação profissional de um sistema de medição do Google Analytics em seu site. Escrito pelo “evangelista de dados” e especialista em Google Analytics Jonathan Weber e a equipe da LunaMetrics, este livro oferece conhecimento fundamental, uma coleção de receitas práticas do Google Tag Manager, as melhores práticas comprovadas e dicas de solução de problemas para colocar sua implementação em excelentes condições. Este livro aborda, entre outros assuntos: • Como implementar o Google Analytics via Google Tag Manager • Como personalizar o Google Analytics para sua situação específica • Como usar o Google Tag Manager para rastrear e analisar as interações em vários dispositivos e pontos de contato • Como extrair dados do Google Analytics e usar o Google BigQuery para analisar questões de grandes volumes de dados (Big Data)
Learn the right cutting-edge skills and knowledge to leverage Spark Streaming to implement a wide array of real-time, streaming applications. This book walks you through end-to-end real-time application development using real-world applications, data, and code. Taking an application-first approach, each chapter introduces use cases from a specific industry and uses publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation. The domains covered in Pro Spark Streaming include social media, the sharing economy, finance, online advertising, telecommunication, and IoT. In the last few years, Spark has become synonymous with big data processing. DStreams enhance the underlying Spark processing engine to support streaming analysis with a novel micro-batch processing model. Pro Spark Streaming by Zubair Nabi will enable you to become a specialist of latency sensitive applications by leveraging the key features of DStreams, micro-batch processing, and functional programming. To this end, the book includes ready-to-deploy examples and actual code. Pro Spark Streaming will act as the bible of Spark Streaming. What You'll Learn Discover Spark Streaming application development and best practices Work with the low-level details of discretized streams Optimize production-grade deployments of Spark Streaming via configuration recipes and instrumentation using Graphite, collectd, and Nagios Ingest data from disparate sources including MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver Integrate and couple with HBase, Cassandra, and Redis Take advantage of design patterns for side-effects and maintaining state across the Spark Streaming micro-batch model Implement real-time and scalable ETL using data frames, SparkSQL, Hive, and SparkR Use streaming machine learning, predictive analytics, and recommendations Mesh batch processing with stream processing via the Lambda architecture Who This Book Is For Data scientists, big data experts, BI analysts, and data architects.
Author: Rajdeep Dua
Release Date: 2016-10-31
Develop intelligent machine learning systems with SparkAbout This Book*Get to the grips with the latest version of Apache Spark*Utilize Spark's machine learning library to implement predictive analytics*Leverage Spark's powerful tools to load, analyze, clean, and transform your dataWho This Book Is ForIf you have a basic knowledge of machine learning and want to implement various machine-learning concepts in the context of Spark ML, this book is for you. You should be well versed with the Scala and Python languages.What You Will Learn*Get hands-on with the latest version of Spark ML*Create your first Spark program with Scala and Python*Set up and configure a development environment for Spark on your own computer, as well as on Amazon EC2*Access public machine learning datasets and use Spark to load, process, clean, and transform data*Use Spark's machine learning library to implement programs by utilizing well-known machine learning models*Deal with large-scale text data, including feature extraction and using text data as input to your machine learning models*Write Spark functions to evaluate the performance of your machine learning modelsIn DetailSpark ML is the machine learning module of Spark. It uses in-memory RDDs to process machine learning models faster for clustering, classification, and regression.This book will teach you about popular machine learning algorithms and their implementation. You will learn how various machine learning concepts are implemented in the context of Spark ML. You will start by installing Spark in a single and multinode cluster. Next you'll see how to execute Scala and Python based programs for Spark ML. Then we will take a few datasets and go deeper into clustering, classification, and regression. Toward the end, we will also cover text processing using Spark ML.Once you have learned the concepts, they can be applied to implement algorithms in either green-field implementations or to migrate existing systems to this new platform. You can migrate from Mahout or Scikit to use Spark ML.
Become an expert in the innovative containerization tool to unlock new opportunities in the way you use and deploy software About This Book Harness the power of Docker to create a robust and resilient environment in which you can generate portable, composable, scalable, and stable application containers Learn the art of container networking with elevated efficiency using Docker Better manage Docker containers using expert techniques and methods Explore the ways to keep your Docker environment secure Deploy your applications easily Who This Book Is For Whether you are a developer or a sysadmin, or anything in between, this course will give you the guidance you need to use Docker to build, test, and deploy your applications and make them easier, even enjoyable. What You Will Learn Learn how to install Docker across all the platforms along with a few troubleshooting techniques Build, push, and publish images on Docker Hub Orchestrate multiple containers with Docker Compose Test and debug applications inside a Docker container Get to know the basics of networking and see how Docker networking works Discover the tools built into Docker to gain an insight into your container's performance Take advantage of the various SaaS offerings from third parties to move monitoring away from your local infrastructure and into the cloud Familiarize yourself with third-party tools such as Traffic Authorization, Summon, sVirt, and SELinux to secure your Docker environment Integrate Docker with a wide range of cloud and configuration tools to fully realize its potential In Detail So hot off the presses, the latest buzz that has been on the tip of everyone's tongues and the topic of almost any conversation that includes containers these days is Docker! Docker has been a game-changer when it comes to virtualization. With this course, you will go from just being the person in the office who hears that buzz to the one who is tooting it around every day. This course will be a smooth journey covering Docker from scratch to finish! The first module will help you get familiarized with the fundamentals of Docker. The second module will show you how to create, deploy, and manage a virtual network for connecting containers spanning single or multiple hosts. In the third module, you'll get to grips with monitoring your Docker apps and containers - this will show you how monitoring containers and keeping a keen eye on the working of applications helps improve the overall performance of the applications that run on Docker. The purpose of our fourth module, Securing Docker, is to provide techniques and enhance your skills to secure Docker containers easily and efficiently. Finally, you'll see how to deploy Docker in production and three interesting GUI applications: Shipyard, Panamax, and Tutum. Style and approach Covering best practices to make sure you're confident with the basics, such as building, managing, and storing containers, before diving deeper into Docker security, you'll find everything you need to help you extend and integrate Docker in new and innovative ways.