Skip to main content

How to transfer data from Cloud Datastore to Big Query in Google Cloud Platform

If you are here I am assuming that you are looking to migrate the data from Cloud Datastore to Big Query because you want to do some analysis and are frustrated by limitations imposed by GQL (Google Query Language).

First of all you need to create a backup of the data in datastore. Use the Datastore Admin tool provided by Google to take a backup and store it automatically in the Cloud Storage bucket.

Select all the entities and press 'Backup Entities'. Give a backup name, select Google Cloud Storage as backup storage destination and specify a bucket name.

Once the backup job is completed, you will see the backup listed. You can select a backup and press 'Info' and see the details (Entities are masked in the screenshot below).

Go to the bucket mentioned in 'Handle' and you will see the file mentioned above. You will also see many more files with similar names, ending with .backup_info (e.g. ahRzfmpkYS1wZC1zbG8tc2FuZGJveHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhix_-4DDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.JobDetailsEntity.backup_info)

This is the backup file for a specific entity which you will need to specify when creating a table in Big Query.

Head over to Big Query and create a new dataset.

In Location field select 'Google Cloud Storage' and give the location of backup file for the specific entity. File format is 'Cloud Datastore Backup'.

Like the one we found earlier: gs://jda_so__78700310-e2f9-4cf2-8f20-dd325de09a4d_data_bkup/ahRzfmpkYS1wZC1zbG8tc2FuZGJveHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhix_-4DDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.JobDetailsEntity.backup_info.

Here the bucket name: jda_so__78700310-e2f9-4cf2-8f20-dd325de09a4d_data_bkup is coming from the 'Handle' field in the backup information in the datastore admin and the file name you got in the previous step!

Specify the name of the table you want to create in Big Query in 'Destination' field. Press 'Create Table' and if everything is correct, the job will complete successfully.  Select a table from the left panel and click on Preview to see the data populated. And you are done!

Let me know in comments if you have any questions.


Popular posts from this blog

How to upload to Google Cloud Storage buckets using CURL

Signed URLs are pretty nifty feature given by Google Cloud Platform to let anyone access your cloud storage (bucket or any file in the bucket) without need to sign in.

Official documentation gives step by step details as to how to read/write to the bucket using gsutil or through a program. This article will tell you how to upload a file to the bucket using curl so that any client which doesn't have cloud SDK installed can do this using a simple script. This command creates a signed PUT URL for your bucket. gsutil signurl -c 'text/plain' -m PUT serviceAccount.json gs://test_bucket_location
Here is my URL:…

File upload problem: UTF-8 encoding not honored when form has multipart/form-data

The problem that I was facing was something like this. I was using Apache Commons File Upload library to upload and download some file.

I had a form in which user can upload a file and another field 'name' in which she can give any name to the file being loaded.

When I submitted the form, the file was uploaded fine but the value in name field was garbled. I followed all the possible suggestions I found:

<%@page pageEncoding="UTF-8"%> set. <%@page contentType="text/html;charset=UTF-8"%gt; set after the first directive. <meta equiv="Content-Type" content="text/html;charset=UTF-8"> in the head. enctype="multipart/form-data" attribute in the form. accept-charset="UTF-8" attribute in the form.
in the Servlet:
before doing any operations on request object: request.setCharacterEncoding("UTF-8"); For accessing the value

FileItem item = (FileItem);

if (item.isFormField()) {

//For regular…

Uploading and Retrieving images on Google Cloud Storage

You would already be aware that there are multiple options given by Google Cloud Platform to store data. Here is Google documentation on when to use which option: Google recommends using Google Cloud Storage (GCS) to store static content like files/videos etc. There is something called 'Blobstore' as well which is also used to store such content but it is on the way to being deprecated. This page talks about using GCS to store images. Look at this page to understand basic requirements for setup of GCS. In the Cloud Store Browser below following buckets are already available. If you select any bucket, you would be able to see the objects created in it.  Here you can see the image file in the '' bucket. You won't be able to add/delete files or folder from the browser if you don't have proper access but through code (running with the service account) it should not be a problem. Objects on GCS are immutable so you can't edit an ob…