Skip to main content

Posts

Showing posts from 2020

Azure Data Lake Analytics: In short

Azure is a great cloud platform but its services have mostly terrible names and confusing, often heavily overlapping features between them. And so another one in this series of confusing services is Azure Data Lake. I have tried to make sense of this service and what it does! Azure Data Lake   Azure Data Lake has two components, which are very tightly integrated: a.       Azure Data Lake Analytics (ADLA) b.       Azure Data Lake Storage (ADLS)   ADLA gives us the ability to run analytical jobs (query, extract, aggregate, transform output etc.) on data, which is stored in ADLS, in form of files.   Below ‘temp’ is the name of ADLA account. As we can see ‘tempdl’ is the name of associated ADLS account, which acts as the data source for this ADLA account.   As we can see below, the ‘tempdl’ has got following data stored in form of files. ADLA lets us query, transform and do other operations on this data using USQL (prope...

AZ-900 Microsoft Azure Fundamentals : Study Notes

List of resources to prepare for AZ 900 certification: 1. Tim Warner's video playlist   2. Andrew Brown's 3-hour video (broken into individual videos for every topic) 3. My collection of Microsoft documentation links And below are my own study notes,  created from all of the above. Happy preparing. Availability Options An  Azure geography is a discrete market,  typically containing at least one or more regions, that preserves data residency and compliance boundaries.    An  Azure region is a set of datacenters , deployed close by and connected through a dedicated regional low-latency network.  Each Azure region is paired with another region within the same geography (such as US, Europe, or Asia) at least 300 miles away, which together make a  region pair    Examples of geographies and corresponding regions:   Geography Regions (Location of datacenters) India Central India (Pune), South India (Chennai), West India (Mumbai) Eur...

Example of Using SimpleHttpOperator to make POST call

Airflow has SimpleHttpOperator which can be used to invoke REST APIs. However using this operator is not exactly straightforward. Airflow needs to be told about the connection parameters and all the other information that is needed to connect to external system. For this we need to create Connections. Open 'Connections' page through Admin->Connections link.  Expand the dropdown to see the various types of connection options available. For a REST call, create an HTTP connection. Give the host URL and any other details if required. Now when we write our task using SimpleHttpOperator we will need to refer to the connection that was just created. The task below is making a post call to  https://reqres.in/api/users  API and passing it some data in JSON format. myHttpTask = SimpleHttpOperator(  task_id='get_op',  method='POST',  http_conn_id='dcro',  data=json.dumps({    "name":"Morpheus",    " job ":" L...

How to set Trace Sampling rate for Jaeger in Istio?

The default trace sampling rate for Jaeger in Istio is 1%, meaning, to get a single trace in Jaeger you need to send at least 100 requests to your application. This is good for a production environment but in dev or QA you would want a better sampling rate. Unfortunately, Istio doesn't give any UI or dashboard to view/edit lots of settings and options which it uses. The documentation related to this is also hard to find. If you are installing using istioctl then keep an eye for ' values.pilot.traceSampling ' option. See more details about this and other options here. If istio is already installed and running, we need to edit the ' istio-pilot ' deployment and change the PILOT_TRACE_SAMPLING environment variable by running below command: $ kubectl -n istio-system edit deploy istio-pilot    This opens vim (or whatever your default text editor) window with the deployment config file and lets you edit the value (which ranges from 0.0 to 100.0) Just make ...