REST API {That Actually Does Something} in GOlang
tl;dr — I’m going to create a REST API using golang that takes in data, applies a transformation to it (‘normalizes it’ — for my machine learning folk out there) and responds to the client with the transformed data.
So often is it that we come across a Medium post that looks promising and related to the exact problem we are trying to solve. We spend our precious minutes combing through a “10 minute read” only to find the author demonstrating the same basic “example code” that everyone else already has in another medium. I mean no offense to those dedicated hackers trying to shed a little knowledge (I’m probably guilty of doing this myself somewhere out there on the web). But I always seem to need something a little deeper, something a little meatier. Knowing that there are more out there like me, I’m going to take the initiative of bridging the gap between boilerplate and “actually relevant to my problem”. With that being said, here’s a golang REST API that actually does something.
So what do I mean by an API that actually does something? Sure, all examples of GO api’s out there do *something*. They show the typical CRUD processes. If you’re unfamiliar with CRUD, it simply means Create.Read.Update.DESTROY!!! I guess “Delete” doesn’t sound cool enough. They illustrate creating data to be stored somewhere, reading data from a storage somewhere, updating that data being stored somewhere, and DESTROYING data being stored somewhere. But I’m a Data Scientist, I need actual data transformations. I need a trained model to make predictions on new values. I need data imputed with quality estimates. I do not need the capability to retrieve a book from a database with a particular ISBN. Hence, REST API that actually does something. For those of you who are unfamiliar with REST API’s, bluntly put, they do this:
[server]: “I’m listening at this URI for a request. If you send me data that looks like *this* — I’ll do something and respond with data that looks like *that*”
[client]: “Ok — here’s my data, now gimme gimme gimme”
[server]: “Chill out bro — I’m processing over here”
…….A few moments later
[server]: “I did what you wanted, impatient jerk”
[client]: “Yay =) “
Notice how the server doesn’t care about credentials, logins, or even what language compiled the request and sent the data. As long as you provide the data in the format the server is expecting, it will respond. That’s not to say you can’t add a layer of security to your REST API. There’s authentication headers, authorizations, timestamps, tokens, blah blah blah. But these topics are outside the scope of this read. We just want to do something.
The method of API’s is beneficial in my field because their are 800 (hyperbole) different programming languages being utilized by various teams. With REST APIs, it doesn’t matter which one they’re using. They can still make use of the code I’ve created. This saves time and nullifies ever having to rewrite code that has ALREADY BEEN WRITTEN! Not that I’m speaking from experience to that or anything. So let’s being.
We are going to create an API that expects a list of values and normalizes them. This is a common practice in machine learning before inputs are fed into a model. And sure, there are modules in a lot of languages that already do this. But I need to give you a relevant example and I can’t post actual code from work on here because…..you know….IT compliance and all. Corporate Espionage is alive and well people.
First, we will follow the normal web and REST API templates out there by instantiating a mux router, a handler function, and establish which port to listen for requests.
Bluntly put, we created a server that listens on localhost at port 8000 (line 18). When a request is received, the mux router handles it (r variable created at line 12 and being passed in as a variable at line 18) by matching the incoming request with the proper handler. With this basic code, it will handle it by doing, well, absolutely nothing. The reason is because we have one handler (line 15) that when a request is made on port 8000 to localhost at directory ‘/normalize’, it will call the function normalize. Well, we haven’t created that function yet. So let’s do that next. One thing to note, the method for the handler function is “PUT”. REST API’s use http and https request, which have various type of methods:
- GET
- PUT
- POST
- DELETE
- PATCH
and a few others. We are using PUT because, bluntly answered, it is just a bad idea and a big no no to parse a request body for any other method. I’ll leave further intuition on that to the readers research. Just know that our data will be put in the request body of the http request, not the request string, due to its size limitations. All of that will make sense in upcoming sections.
We are going to build the normalize function out piece by piece since there will be a lot of moving parts.
First, notice that we’ve imported a new package “io/ioutil”. Next, when the HandleFunc calls the function normalize (line 26), it by default passes 2 objects to the function, a response writer and the request itself. This is why our normalize function (line 11) has inputs:
- w — which is type http.ResponseWriter
- r — which is type *http.Request
I’m not going to go into detail about the * in front of the http.Request type. I leave the reader to research the topic of pointers.
r is what we are interested in here. That is all the data and metadata within the request to our API. More specifically, we care about the body of the request. The request body is sent to the server as bunch of bytes that will need to Unmarshal. To initially handle these bytes, we must first read them in and store them. Bring in ioutil.Readall() (line 14). This function accepts the request body and returns the bytes it read plus any error it may have come across. If there is an error, it is written back to the client (line 15 and 16). Now that we have read in the body bytes, we can proceed to Unmarshaling them.
WAIT…..Marshaling???
From grand ole wikipedia
In computer science, marshalling or marshaling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission, and it is typically used when data must be moved between different parts of a computer program or from one program to another. Marshalling is similar to serialization and is used to communicate to remote objects with an object, in this case a serialized object. It simplifies complex communication, using composite objects in order to communicate instead of primitives. The inverse of marshalling is called unmarshalling (or demarshalling, similar to deserialization).
Bluntly put — unmarshalling in this context is the process of mapping an incoming request (JSON object in our case) to an object in our domain (a variable).
We’ve added two new lines to our normalize function (line 21 and 23). First we declared the variable “data” and set its type as “responseBody”. Then we used, from standard library, the json method that unmarshals the request body bytes into the “data” variable. The & in front of data, again, has to deal with pointers. If you’re following along, then there should be a red flag with what we just added. What the heck is a responseBody type. Well ladies and gents, that is up to you to decide……but not really. I’m going to do it.
Remember earlier where I pointed out that the REST API expects incoming data in a specific form? Well that form is responseBody. Since the ultimate goal is to normalize input data for a machine learning model, the incoming data should be a JSON object with only one key:value pairing — the data itself.
So above the normalize function, we created a custom struct for the incoming request to be mapped to. Within this struct, there is an attribute called Data that is being declared as a slice of type float32 and it is being mapped to the “data” key of the incoming JSON. If this is confusing to you, then pause, read up on structs, slices and arrays in go, then come back. Don’t worry, I’ll wait.
You’re back! Great. So quick recap
- We have initialized a mux router listening for requests on port 8000
- We have created a function for the mux router to invoke when a request is received
- We have established the form that the incoming data should be in
All that is left is actually transforming the data and serving it to the client. In order to normalize data, the most common method is simply calculating the Z-score for each data point. This sounds daunting, but it’s actually simple
where
- x_i → data point
- x_bar → mean of the data
- s → standard deviation of the data
This quickly reveals to us that we need to create at least 3 more functions: one to calculate the mean, one to calculate standard deviation, and the other to calculate z-scores for all the data. Let’s start with the mean.
I’ll run through this function real quick. It takes in “x” which is a slice of values of type float32. The function will return a single value of type float 32. Two variables are initialized, “sum” and “length”, both of type float32. The function then loops through all the values in the variable “x” and calculates a sum of all the values. The total number of values are stored in variable “length”, which is literally the length of the x array passed into the function. The mean is then calculated and returned.
Now for the standard deviation function.
This one almost mimics the mean function, and actually makes use of the mean function. Quick rundown — The function accepts “x” which is a slice of values all of type float32, and returns a single value of float32. Variables “avg” and “ssd” are initialized as type float32, and variable length is immediately set as the length of the array “x” and coerces the type to float32. The variable “avg” calls the mean function we just crated and finds the mean of the data. The function then loops through the data and calculates the standard deviation. Cool.
Finally, we need a function to make use of these two functions and normalize our data.
Again, this one is like the other two in design. The only difference is the function takes in a slice of values of type float32 and returns another slice of values of type float32. This returned slice is the final array of transformed data. Now that we have all functionality in place, we only need to call the “norm” function from the “normalize” function, and we are done. Here is the full code
We will use python to test out our new API. I’m going to use a library in python called sklearn to normalize data that I will randomly generate. Then I will pass that same data to my API and compare the results of the two. In order to launch our REST API, simply open a terminal or command prompt, navigate to where your go script resides, and type:
go run main.go
EDIT: When you invoke “go run *.go”, golang is compiling and executing your code simultaneously while storing the binaries in your computer’s temp folder. Because of this, golang can only compile and run a main package. Therefore your script must be labeled “main.go” and your package statement at the very top must be “package main”
my particular file is called main.go — if you named your file something else, then just exchange my file name with your filename.
if you did everything right, bluntly put, nothing should happen. Your server is now running on your machine and listening on port 8000 for requests. First, I’ll create the normalized values using a ready made python library.
We have randomly generated data, and its scaled counter part — or z-score. The scaling was done by sklearn’s StandardScaler. Next we will send the same data to our API and see what it returns. First, I’ll show the json object that is going to be sent to the API.
As you can see, it is in the form that the API expects. ‘data’ is the key, and then a list of numbers as the value.
Finally, we can see I sent a “put” request to the URI of ‘http://localhost:8000/normalize’. That is the URI we setup our sever to listen on. In the ‘put’ request, I embedded the JSON object of values to be normalized and the server responded with values that match exactly what the sklearn library produced.
Thanks for reading, and stay tuned for more.