Last time, we explored the power behind the ability to validate data files being validated before coming in to your ecosystem which had great efficiencies for you, your stakeholders but also improved the security and cleanliness of your data. I went on to talk about how powerful it could be if you sent this file through an API. You would be able to have all these data files validated within seconds at the first moment of contact rather than later to only find out all the file was full of errors, bad types or missing data points. If you weren't able to check my previous post it might be worth checking it so you don't miss anything here: Frictionless example
Now let’s get into the details, shall we?
Previously I showed the python file validation part, the main difference here is instead of having a file preset we have now used "file_path = sys.argv[1]" to set the file path. The rest of the code is here:
from frictionless import validate
import sys
import os
def validate_file(file_path):
try:
# Use Frictionless to validate the file
report = validate(file_path)
return report
except Exception as e:
return f"Error occurred while validating file: {e}"
def output_validation_results(report):
if report.valid:
print("Validation successful! No issues found.")
else:
print("Validation failed! Issues found:")
for error in report.flatten():
print(f"- {error}")
def main():
# Provide the file path here
file_path = sys.argv[1]
print(file_path)
if not os.path.exists(file_path):
print("File not found.")
return
print(f"Validating file: {file_path}")
# Validate the provided file
report = validate_file(file_path)
# Output validation results
output_validation_results(report)
if __name__ == "__main__":
main()
and the invalid csv file we tested against here:
id,name,,name
1,english
1,english
2,german,1,2,3
Remember this example file may be small but I have tested this on much larger files, it is just as quick even when dealing with thousands of data points and more. Now, how do we integrate this Python file into an API? I turned to Go for this task, which has been an enjoyable experience in many of my recent projects. Go's reputation for performance and its statically typed nature makes it an excellent choice for handling API endpoints. Its ability to integrate with modern architectures with minimal code makes development efficient. Furthermore, Go's unique feature of executing Python code within itself adds versatility to the solution. This allows us to leverage the strengths of both languages to deliver fast and reliable validation services.
Let's take a look at the code:
package main
import (
"io"
"log"
"net/http"
"os"
"os/exec"
)
// FileVerificationResult represents the result of the file verification process
type FileVerificationResult struct {
FileName string `json:"file_name"`
Verified bool `json:"verified"`
Message string `json:"message,omitempty"`
}
// handleFileUpload handles the file upload request
func handleFileUpload(w http.ResponseWriter, r *http.Request) {
// Parse the incoming file from the request
file, _, err := r.FormFile("file")
if err != nil {
http.Error(w, "Failed to parse file", http.StatusBadRequest)
return
}
defer file.Close()
// Save the file to a temporary location
tempFile, err := os.Create("uploaded-file.csv")
if err != nil {
http.Error(w, "Failed to create temporary file", http.StatusInternalServerError)
return
}
defer tempFile.Close()
_, err = io.Copy(tempFile, file)
if err != nil {
http.Error(w, "Failed to save file", http.StatusInternalServerError)
return
}
// Call the Python script for verification
cmd := exec.Command("python", "insert_python_file_validation.py", tempFile.Name())
output, err := cmd.Output()
if err != nil {
http.Error(w, "Failed to validate file", http.StatusInternalServerError)
return
}
// Set the Content-Type header to plain text
w.Header().Set("Content-Type", "text/plain")
// Write the validation output as the response
w.Write(output)
}
func main() {
// Define the endpoint for file upload
http.HandleFunc("/verify-file", handleFileUpload)
// Start the HTTP server
log.Println("Server started on port 8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
I've added comments to most areas of the code to provide detailed explanations but here's a quick rundown:
- We've set up a basic endpoint at '/verify-file' that triggers the handleFileUpload function when accessed.
- Within handleFileUpload, we first check if a file was included in a POST request. If so, we proceed to create a local file using os.Create. This file is then passed to the Python exec command for validation. Remember to replace the placeholder with your actual Python file name.
Finally, to run the code and see it in action:
- Ensure both the Python and Go files are in the same directory.
- Navigate to the directory in a Unix terminal and execute the 'go build' command, replacing 'InsertGoFileNameHere.go' with your actual Go file name.
- Once built, run the generated executable file by executing '.\InsertGoFileNameHere.exe' in the terminal, replacing 'InsertGoFileNameHere.exe' with your actual executable file name.
- Open another terminal capable of making POST requests (such as curl or Postman). Execute the following command, ensuring the file path points to the file you wish to test:
curl -X POST -F "file=@./invalid.csv" http://localhost:8080/verify-file
That's it! If you've followed along closely, you should now see the potential power behind this setup and the exciting possibilities it offers.