Cloud Bigtable scales to billions of rows and thousands of columns, storing tens of terabytes or even petabytes of data, and is a low-data-dense table. An index is generated for a single value in each row, which is called the row key. Cloud Bigtable is ideal for storing very large amounts of single-key input data with very low latency. Cloud Bigtable supports high read and write throughput with low latency, making it an ideal data source for map-reduce operations.
Cloud Bigtable's powerful backend server offers several important advantages over self-managed HBase installations :
1. Superior scalability
2. Simple management
3. Resize the cluster without downtime
Cloud Bigtable is ideal for applications that require extremely high throughput and scalability for key/value data. Where each value is typically less than or equal to 10 MB. Cloud Bigtable also excels as a storage engine for batch map-reduce operations, stream processing/analysis, and machine learning applications.
Time Series Data - CPU and memory usage over time for multiple servers
Marketing Data - Purchase History and Customer Preferences
Financial Data - Transaction History, Stock Price, Currency Exchange Rate
Internet of Things data - reports on usage of energy meters and appliances
Graph Data - Information about how users connect to each other
Cloud Bigtable consists of aligned key/value mappings, each storing data in a large scaleable table. This table typically consists of rows describing a single item and columns containing individual values for each row. Each row is indexed by a single row key, and the associated columns are usually grouped into column family. Each column is identified by a combination of column families and column quantifiers, which are unique names within the column families.
This table has one column family (follows group). This group contains multiple column qualifiers.
Column quantifiers are used like data. These design options are possible thanks to the scarcity of the Cloud Bigtable table and the ability to add new column qualifiers on the fly.
The user name is used as the row key. If the user name has an even alphabet, it is natural that data access is evenly achieved across the entire table. For more information on how Cloud Bigtable handles non-uniform loads, see Load balancing and Row key selection.
As shown in the diagram, all client requests are passed through the front-end server and then sent to the Cloud Bigtable node. In the source Bigtable article, these nodes are called 'Tablet Servers'. These nodes consist of a Cloud Bigtable cluster that belongs to the Cloud Bigtable instance, the container in the cluster. The Cloud Bigtable table is split into blocks of consecutive rows called tablets, which distribute query workloads.
To get the most out of Cloud Bigtable's write performance, it is important to distribute writes as evenly as possible across nodes. One way to achieve this goal is to use a row key that does not follow a predictable sequence. For example, user names tend to have more specific alphabets. Therefore, if you include the user name in the start position of the row key, the writes will be distributed relatively evenly.
At the same time, it is useful to group the associated rows adjacent to each other so that they can read multiple rows simultaneously more efficiently. For example, if you store different types of weather data on a hourly basis, rowkey might be in the format of a timestamp following where the data is collected (for example, Washington DC#201803061617). This type of rowkey groups the entire data in a single location into consecutive rows. For different locations, the row starts with a different identifier. Because many locations collect data at the same speed, writes will be evenly distributed across tablets.
Cloud Bigtable uses intelligent algorithms to automatically compress data. You cannot configure compression settings for a table. However, it is useful to know how to store data for efficient compression.
Random data cannot be compressed as efficiently as patterned data. The patterned data includes text, such as this page that you are reading now.
The compression efficiency is highest when the same values are close to each other, whether they are in the same row or in adjacent rows. You can compress the data efficiently by arranging row keys so that rows containing the same chunk of data are adjacent to each other.
Values greater than 1 MiB are compressed before saving to Cloud Bigtable. This saves CPU cycles, server memory, and network bandwidth. Cloud Bigtable automatically disables compression of values greater than 1 MiB.
The Cloud Bigtable schema design differs from the RDBMS schema design. In Cloud Bigtable, a schema is a blueprint or model of a table that includes the following table component structures:
Cloud Bigtable is a key/value repository, not a relational repository. Joins are not supported and transactions are only supported within a single row. Each table has only one index corresponding to the row key. That is, there is no secondary index. Each row key must be unique.
Rows are sorted alphabetically by row key (from lowest to highest byte string). The column family is not stored in a specific order. Columns are grouped by column family and sorted alphabetically within the column family. The cross section of a row and column can contain multiple time-stamped cells. Each cell contains a unique version of data with timestamps in rows and columns. Ideally, both reads and writes should be evenly distributed throughout the table's row space. The Cloud Bigtable table is low density. Columns do not occupy space in rows that do not use columns.
Stores a dataset with similar schemas in the same table, not in a separate table. Creating multiple small tables in Cloud Bigtable is a pattern that you should avoid because:
Insert the relevant columns into the same column family. If a row contains multiple values that are associated with each other, it is recommended that you group the columns that are contained in the column family. When you group data as closely as possible to avoid the need to design complex filters, you get only the information you need from frequently used read requests. Create up to 100 column families per table. Column Select the name of the family as short and meaningful. Put columns with different data retention requirements in different column families.
Handle column qualifiers as data. To reduce the amount of data sent from each request, name the column qualifier short and meaningful name. The maximum size is 16KB.
Do not store more than 100MB of data in a single row. Rows that exceed this limit may degrade read performance. Storing related items in adjacent rows makes reading more efficient.
Design the rowkey based on the query you want to use for data retrieval. Keep the row key short. Store multiple separated values in each row key. Row key segments are usually separated by delimiters such as colons, slashes, or hash symbols.
In particular, time series data's schema architecture. Cloud Bigtable is perfect for storing time series data. Cloud Bigtable stores data in unstructured columns and rows, each row has a row key, and the row keys are sorted alphabetically.
The most common time series issue in Cloud Bigtable is hotspot formation. This issue can affect any type of row key that contains monotonously increasing values. For example, if there is a service that uses the resident registration number as a low key, and it is especially popular among those born in 1998 to 2008, it will be difficult to perform properly because the data within the key range will be larger than other age groups, and the node responsible for the low key range will be heavily loaded in 98 to 2008. This particular range of low key data is called hotspot, and field promotion and sorting can be used to prevent this.


Hash divided as 3 because predicted number of clusters is 3.
![]
By default, field promotion is better. Field promotions prevent hotspot formation in almost any case and make it easier to design rowkeys to facilitate querying.Use sorting only if field promotion does not solve the load concentration problem.
When you create a Cloud Bigtable instance, select whether the cluster stores data on solid state drives (SSDs) or hard disk drives (HDDs). In most cases, SSD storage is the most efficient and cost-effective choice. HHD repositories are sometimes suitable for latency-sensitive, infrequently accessed, and extremely large datasets of more than 10TB.
SSDs are much faster and more predictable than HDDs. HDD throughput is much more limited than SSD throughput. HDDs are very slow to read individual rows. Unless you store a very large amount of data, there is no significant difference between the cost of using HDDs and the cost of Cloud Bigtable cluster nodes. Therefore, it is usually not recommended to use HDD storage unless you store more than 10TB of data.
You expect to store more than 10TB of data You are not planning to use data to support applications that are visible to you or that are latency sensitive. Batch operations that perform searches and writes and randomly read only occasionally a small number of rows; write very large amounts of data and keep very rarely read.
When you create a Cloud Bigtable instance and cluster, you cannot reverse the SSD or HDD storage that you select for the cluster. You cannot use Google Cloud Console to change the type of storage used in the cluster.
If you need to convert an existing HDD cluster to SSD, or vice versa, you need to export data from an existing instance and import data into a new one. Alternatively, you can copy data from one instance to another using Dataflow or Hadoop MapReduce operations. Because migrating the entire instance takes time, you might need to add nodes to the Cloud Bigtable cluster before migrating the instance.
[1] https://cloud.google.com/bigtable/docs
[2] https://bcho.tistory.com/1217