FOTC
  • Products
    • Google Workspace
    • Google Cloud
  • Services
    • Cloud engineering as a service
    • Cloud Infrastructure Strategy Roadmap
    • Google AI
    • Landing Zone
    • Security audit
    • Technical support
  • About us
  • Startups
  • Resources
    • Case studies
    • Blog
    • Partner programme
  • Careers
Contact
ro pl hu en
  • Privacy policy

BigQuery – getting started

EN » Blog » BigQuery – getting started

Beata Socha

4 June 2023
BigQuery – getting started

Contents

  • What is Google’s BigQuery?
  • BigQuery business use cases
    • Optimising logistics
    • Cinematic success
    • Reading preferences
    • Stock price forecasting
    • Personalised playlists
  • BigQuery costs – how much does cloud data analysis cost?
    • Data analysis costs
    • Data storage cost
  • BigQuery tutorials
    • Reading the database
    • Data selection
    • Counting records
  • Test BigQuery for free

Data is the new gold – at least for all companies and organisations with views to expand. A business that is aware of the power hidden inside data carries out process analysis and introduces data-driven changes and improvements. Some go even further – with the technology at their disposal they predict trends, probable market changes and the potential consequences of business decisions they have yet to make.

As your business grows, so do the swathes of data you collect – gigabytes turn into terabytes and petabytes. To keep costs and the time it takes to generate a report at a reasonable level, you need the right tools. BigQuery can be a huge help with both. As the name indicates, it is a service dedicated to analysing Big Data, the fuel for so many analytics tools, including the popular Google Analytics.

What is Google’s BigQuery?

It’s a serverless, scalable cloud data warehouse service. It allows you to service millions of queries and perform advanced analytics on petabytes of data in SQL, without worrying about costly maintenance of advanced infrastructure, scaling up or down and balancing traffic.

BigQuery is one of the services available on Google Cloud. Your datasets are stored and processed on a stable, secure and scalable Google Cloud (here you can find out what is Google Cloud and how it can support your business). Using this service you can create your own analytics tool – for example a data warehouse – which can track progress and changes within and outside the company. You can take advantage of the built-in Machine Learning models within BigQuery, which offer a range of predictive and analytics tools for different business scenarios.

Serverless technology

BigQuery is a serverless service, which means that maintenance is fully automated. You don’t need to worry about maintaining the physical data centre, updating software, setting scaling rules, balancing traffic and complex security principles. Google Cloud, the service provider, makes sure they work seamlessly, while you can focus on running analytics without any distractions. According to an ESG study, the total cost of ownership (TCO) of a serverless BigQuery service is 26-34% lower over the course of three years compared to other available analytics tools.

Fast analysis

While databases are best suited for a “small query / fast response,” data warehouses are far more efficient when doing complex analysis on vast loads of information. BigQuery needs seconds to process terabytes of data, and about 3 minutes for a petabyte. If you have a load of historic and current data, analysing it with BigQuery will take up to several minutes. Other data analysis system would need even a few hours.

No source code changes

In order to use BigQuery, you don’t need to make major changes or alter the source code. That’s because BigQuery supports the ANSI SQL:2011 standard and offers free programming interfaces ODBC and JDBC.

Automated backup

The tool creates an automated backup and keeps the history of changes for 7 days. You can easily compare your results with the previous version or restore data.

Analysing data from different sources

You can analyse data from a variety of sources using BigQuery –from Google Marketing Platform, Google Analytics, YouTube and hundreds of external SaaS applications. You can import data manually or build a pipeline to automatically source data and clean it before sending it to BigQuery.

Multi-cloud analytics with BigQuery Omni

BigQuery Omni enables analysing data from other public clouds, without leaving the BigQuery interface. Its architecture is easily scalable and can handle significant workload. It uses a query engine called Dremel to access AWS or Azure clouds so there is no need to transfer data between the clouds.

Built-in Machine Learning (ML) models

One of the services available is BigQuery ML which is dedicated to creating and developing machine learning models by using standard SQL queries. BigQuery ML speeds up the development of ML products by reducing the coding and data transfer to a minimum. You can also integrate the service with Vertex AI and TensorFlow to create and train your own ML models.

Business Intelligence (BI) support

BigQuery BI Engine is an in-memory analysis function which analyses even large datasets almost in real time. BI Engine is integrated with data visualisation tool Data Studio which presents results in form of clear dashboards. The service also supports ad hoc reports which encompass a sliver of the database. To generate these reports you don’t need to involve your analytics or IT trams.

Find out more:

  • Cloud SQL – a cloud database. What is it and why is it worth using?
  • What is Google Cloud Storage, and how to use it?

BigQuery business use cases

Here are some examples how companies use BigQuery capabilities:

Optimising logistics

UPS collects data on hundreds of thousands of parcels: their weight, size, and location. By using machine learning and data analysis, the company can pinpoint the most efficient way of loading delivery vans and containers. The system also allows UPS to prepare for periods of increased traffic, like Christmas.

Cinematic success

20th Century Fox uses Google Cloud, including BigQuery and machine learning. It creates preference profiles for their clients, and analyses movie trailers to determine how popular with audiences a new production will be.

Reading preferences

Safari Books Online is an internet library serving millions of users, comprising over 30,000 books and films. Safari Books Online uses BigQuery to match titles to the preferences to individual readers.

Stock price forecasting

Dow Jones (DJIA) is one of the largest stock exchange indices, which includes giants such as Apple, Amazon and Boeing. By using BigQuery and ML, Dow has created Data, News & Analytics Platform (DNA), which enables companies to run scenario simulations and predict how events (including floods, hurricanes and other natural disasters) will impact the index.

Personalised playlists

Spotify uses BigQuery to offer each of its 400 million users personalised playlists that can change every few hours depending on the music the user is listening to.

BigQuery costs – how much does cloud data analysis cost?

The standard payment model for cloud services is pay-as-you-use, which means you pay only for the resources you actually employ. BigQuery pricing consists of two major elements: data analysis and data storage costs. 

Data analysis costs

You can choose between two payment models for data analysis:

  • on-demand – a flexible model on the number of bytes processed. The first terabyte of data is analysed for free, the following cost $5 per 1 TB. This is a good model for you if you plan to analyse data sporadically;
  • flat-rate – this is a model where you acquire slots and reserve specific vCPU resources that are later used for data analysis. You can reserve a minimum of 60 seconds (flex plan), for a 30 days (monthly plan) or for 365 days (annual plan). The further ahead you plan, the lower the price. This is a solution for companies that want to use the service frequently or continuously. The flat-rate depends on the time frame and the number of slots. You can find more information about this payment plan in Google Cloud pricing documentation.

Data storage cost

Data storage is also done in two tiers:

  • active storage – for tables and partitions that have been modified over the past 90 days. The first 10 GB per month is free, the regular price is $0.020 per GB;
  • long-term storage – for tables and partitions that gave not been modified in the last 90 days. This type of storage is 50% cheaper than active storage: $0.010 per GB, while the first 10 GB a month is stored for free. The storage class changes automatically to long-term after 90 days without data modifications.

Additional costs can encompass ingesting and exporting data, sending data between locations or access through Storage Read API. In order to receive an exact estimate, use cost calculator or consult Google Cloud pricing documentation.

Check out BigQuery for free

FOTC offers 24/7 support, as well as a discount for Google Cloud services

Find out more

BigQuery tutorials

Your first steps in the service

To start using BigQuery, you need a Google Cloud account (here you can read about the top 20 advantages of Google Cloud). You will also need to know SQL.

Go to your Google Cloud console.

Choose the project you want to work on. The list of projects is available from the drop down menu next to the Google Cloud header.

wybór projektu z listy

Once you’ve selected the project, turn on Cloudshell in the top right corner.

uruchamianie cloudshell

You will see this view:

cloudshell widok ogólny

Reading the database

To get an overview of the database (data structure, number of rows, size) use this command:

bq show bigquery-public-data:samples.shakespeare

And this will be the result of your query:

cloudshell wywołanie komendy

In bq commands we will be using a flag –use_leagcy_sql – it sets standard SQL as the query language.

To validate and verify a query, all you need to do is add a dry_run flag. This is what a sample query with this flag looks like:

bq query --use_legacy_sql=false --dry_run \
'SELECT
word
FROM
`bigquery-public-data`.samples.shakespeare
LIMIT 5'
walidacja zapytania SQL

If the query is correct, you will see its result by removing the flag –dry_run

bq query --use_legacy_sql=false \
'SELECT
word
FROM
`bigquery-public-data`.samples.shakespeare
LIMIT 5'

This is the result of the sample query:

wynik zapytania SQL

Data selection

To separate specific data from the records, you can use the where clause that will select data for a specific value, rule or regular expression. If, for instance, you want to select only records that contain the word raising in the word column, all you need to do is use the following query:

bq query --use_legacy_sql=false --dry_run \
'SELECT
word
FROM
`bigquery-public-data`.samples.shakespeare
WHERE
word = raising
LIMIT 5'

First you need to validate the query.

walidacja zapytania SQL

Then run the following command without –dry_run:

bq query --use_legacy_sql=false \
'SELECT
word
FROM
`bigquery-public-data`.samples.shakespeare
WHERE
word = raising
LIMIT 5'
wynik zapytania SQL

Counting records

If you want to check how many records with this word the database contains, use the aggregating function COUNT(record you want to calculate). The query should look like this:

bq query --use_legacy_sql=false \
'SELECT
COUNT(word) as num_of_words
FROM
`bigquery-public-data`.samples.shakespeare
WHERE
word = raising’

Every table can be renamed using the keyword as. In this case we haved a descriptive name aggregated in the column. The result of the query is as follows:

użycie funkcji count dla zapytania SQL

Let’s assume you want to find all records containing the word raising. In this case use the keyword LIKE, which will allow you to search by a given key. 

If you place the % character before the word you are searching for, the system will include prefixed results to, e.g. dispraising. If you add the percentage symbol on both sides %raising%, you will get results including both prefixes and suffixes, e.g. dispraisingly.

When you have more than one column with an aggregating function, you need to indicate with GROUP BY which column you want to group the columns by. 

bq query --use_legacy_sql=false \
'SELECT
word,
COUNT(word_count) AS count
FROM
`bigquery-public-data`.samples.shakespeare
WHERE
word LIKE raising
GROUP BY
word'

This is the result of the query:

użycie funkcji count dla wielu kolumn

Test BigQuery for free

It is best to test BigQuery’s capabilities in practice. If you want to wet your toes and see how Google Cloud solutions (including BigQuery analytics) can benefit your company, contact a certified Google Cloud Premier Partner. 

FOTC experts will help you discover BigQuery potential as well as provide support with your data analysis in the cloud.

Contents
What is Google’s BigQuery?
BigQuery business use cases
BigQuery costs – how much does cloud data analysis cost?
BigQuery tutorials
Test BigQuery for free

Beata Socha

Writer, journalist, storyteller with 15 years' experience in creating high quality copy. At FOTC, Beata works as Content Manager.

Services
  • Cloud Infrastructure Strategy Roadmap
  • Landing Zone
  • Training
Products
  • Google Workspace
  • Google Cloud
  • Google Workspace for Education
Industry
  • Education
  • Gaming
  • Government
  • Healthcare
  • Retail
  • Small and medium businesses
Knowledge
  • Blog
  • Case Studies
  • NIS2 directive
Company
  • About us
  • Career
  • Contact
  • Partner programme
  • Google Workspace Support
  • Privacy Policy
  • Regulations
Copyright © 2014 – 2024 Fly On The Cloud sp. z o.o. KRS: 0000500884, NIP: 8971797086, REGON: 022370270