Skip to content

evialina/automotive_diagnostic_recommender_system

Repository files navigation

Vehicle Diagnostic Prediction Service

The following documentation describes the function, architecture, usage, and setup of the Vehicle Diagnostic Prediction Service. It also describes, in detail, the purpose and functionality of the individual software components comprised within the service. For local deployment, development, and demonstration the repository is equipt with a docker-compose file, which will construct each software component inside of its own docker container. As the service as a while is made up of many different individual components, the use of docker-compose is highly recomended to speed up deployment locally. For production, docker-compose is not recomended. We recomened deploying each component individually to either a cloud-hosted compute/storage solution (depending on the component type), or a self-/cloud-hosted production-ready orchistration service, such as Kubernetes.

Purpose

The purpose of this service is to train and subsequently predict vehicle diagnostic DTC actions based on a reading from a vehicle and remotely fetched warranty data. The service is also capabnle of continuous learnig by updating its internal models at configurable internvals.

The system also includes a set of demo services, these are not to be used in production, and are only provided to aid local development of the productionisable components by providing vehicle fixture data for DTC readings and warranty data, while making this fixture data available in an easy to navigate UI.

Architecture

The service as a whole may be categarised into 4 main sections, with individual software components contained within each. These section can be labed as such:

  • Demo Services
  • Prediction
  • Continuous Learning
  • Training

Demo Services

The demo services are a set of components which should not be used in a production environment. They exist soley to aid local development of the main system by providing a nice interface with which to create and view predictions, while also providing a database of fixture data for vehicle DTC readings and warranty data which can also be browsed in the said interface.

The demo services are comprised primarily of the following components:

  • Fixture's Database (PostgreSQL DB)
  • Web-based client UI (React JS Application)
  • Data-provider API (PostgREST RESTful API)

Fixture's Database

The fixture's database is a Postgres database hosted within a Docker container which holds a sample of the live DTC readings and warranty data. This data has been queried from the AWS Athena instance provided by the client. This database exists to enable rapid querying for local development without having to query large datasets (AWS Athena) over the internet. The schema for the database matches that of the Athena schema provided by the client (for maximum interopability when switching over to a productionised environment) and contains 2 tables: vehicles; and warranty.

In the 'vehicle' table resides all of the DTC readings for a subsection of data samples from Athena. The 'warranty' table holds all available warranty data for all vehicles. A 'consultation' view exists which allows us to query how many warranty records exists for a particular consulation. A consultation is grouped by DTC records taken over the period of 1 day. warranty data for a consultation is all of the warranty records in the period after (and including) the consultation date but prior to the next consultation date.

The database files are volume mounted to the host system for data persistance. The service can interface with other containers within the system (such as the REST API) via a network bridge. The service is not accessible externally from the host system.

Postgrest REST API

In order for the web-based demo application to interface with the fixture's DB, we employ the use of Postgrest to convert our PostgreSQL DB schema into a rest-based API. Postgrest creates a one-to-one mapping between database tables (and views) and REST API resources. These resources are exposed and manipulated via auto-generated CRUD endpoints.

As with other components of the system, the Postgrest server is hosted within a docker container and is included within the bridge network allowing communication between other services within the system. As an aside, for documentation and project completeness, we have also included Swagger Documentation for this generated API. A Swagger container is instantiate which reads the Open-API spec generated by Postgrest and exposes the Swagger-UI client on a unused port to the host system.

Web-based demo client

The web-based demo client (or webclient) exists solely for local system development, model testing, and demonstration purposes. It is used to display vehicle DTC, warranty, and congregated consultation fixture data. Is also used to invoke the prediction service for any particular consultation and display the predicted DTC actions.

The webclient is a static website which is hosted from within its own docker container and exposed to the host machine on an unused port. The website itself is a react based application and leverages the react-admin framework for view and navigation functionality in order to reduce boilerplate in the application code. An adapter is used in order to map the requests from react-admin actions (such as view all or view individual items for a resource) to Postgrest-API compatible HTTP requests. The adapter is also used to map the responses from Postgrest into react-admins internal data model prior to representation on the client. The adapter is used for boilerplate reduction and handles complexities such as result pagination.

The webclient container will not communicate with other services which comprise the system directly (via the network bridge). The client will instead communicate with these services (such as the Fixture API - Postgrest - and the prediction service) directly, as they will have their APIs exposed to the host via unused ports.

Prediction

Prediction Service

The prediction service exposes a rest API to the host on an unused port. This is accessed either directly or via the webclient in order to generate predictions for DTC actions based on consultation data.

The predictions are generated by our latest trained model. The service is constructed as such that the model is read in as a binary file from a specified directory on the host system and hydrated prior to running a prediction. This is to ensure that the latest model (generated by the training service) is always used for each prediction.

Prior to loading the model, the prediction service passed the request data through the preparation service, which sanitises and normalises the DTC and Warranty data such that it is in a suitable format to be passed to the model. This 'formatted' data is also stored in a database to be used for retraining at a later date.

After the model has generated a prediction, this is sent in the HTTP response to the client.

Model storage

The prediction service will load in a precompiled h5 state file which will be hydrated in RAM and used to generate a prediction. For local development, these model files are stored on the host system and are volume mounted to the applicable container services which invoke (prediction service) and generate (training service) them. This allows these services to read and write to the model files as if they exist on the container's own filesystem. This reduces the complexity of the system under-development and reduces the number of resources required to host the models.

In a production system we expect these files to be stored on a dedicated file server or cloud hosting provider's service, such as AWS S3. A backup solution and RAID-1 disk configuration would be advised for self hosted servers. The prediction & training services would both employ a transport (for example, FTP or AWS CLI) in order to move the files onto the chosen hosting service.

Data Preparation Service

The data preparation service exists in its own container and exposes a REST api which is on accessible within the network bridge. There is no direct external connection available to this service. It's primary purpose is to take user input data and sanitise and normalise the data such that it can be accepted as an input into the prediction model as described in [THE DATA PREPARATION SECTION]. The data consumed by this service may come from either: a bulk load operation via the training service, or; an individual request from the prediction service.

Continuous Learning

DTC & Warranty Database (DTCWDB)

The DTC and Warranty Database (DTCWDB) holds all of the data which has been used to generate predictions but has not yet been assimilated into the training set for the model. The data stored within this database is only temporary, and is removed after being passed to the training service for model refinement.

For this reason, we use a simple Postgres DB setup hosted within its own container. Like the data preparation service, no external network connections can be established with this service, as it is only able to connect to other containers within the network bridge.

Scheduling Service

As model training is computationally expensive, we want to avoid running new training jobs every time a new set of data is passed to the prediction service and stored in the DTCWDB. The scheduling service is designed to run at configurable intervals (every day at 3:00am, for example) and pass all of the previous period's data from the DTCWDB into the training service for model refinement. This enables us to perform a single 'batch' training job at times which are most convenient.

The scheduling service is incredibly simple in principal, and consists of a schedular and a bundler, both components run within the same container. The schedular is simply a cron task which invokes the bundler script. This can be configured within the container prior to startup, or may also be configured at runtime.

The job of the bundler is to query the DTCWDB for all of records from the previous period and pass them to the training service via a REST API request. The bundler will then remove the records that had just been queried from the DTCWDB, after which the bundler process will terminate, ready to be reinvoked by the schedular cron job.

Training

Training Service

The job of the training service is to consume training data and output a h5 state file which can be stored for later predictions. The service consumes training data via a REST API. This API is exposed on an unused host port and can therefore be accessed from outside of the bridge network to facilitate bulk training data imports (such as initial training or new external training data), and can also be accessed within the bridge network by the scheduling service.

The method used for model training is described in detail in [THE MODEL TRAINING SECTION]. After a model has been trained, it is saved to a h5 state file and stored on disk at either the volume mounted directory's location (for local development), or to the external dedicated file storage (for a production system).

Project Setup

Prerequisites

In order to run the project locally, docker must be installed on the host machine.

Startup

To build the containers, install required software dependencies (for NPM and Pip) run docker-compose up. To run docker-compose in daemon mode, run docker-compose up -d instead.

The demo service should now be reachabled via a web browser via the address http://localhost:80

In order to generate sample data for the demo service, an SQL migration can be found in ./docker/fixture/db.sql. Execute this script on the db service (DTCWDB) in order to generate the vehicles and claims tables with assoctaed data, the script will also generate the consultations view.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published