Nowadays, predictive approaches in AI are touching many aspects of our daily life, including such tasks as predicting the waiting time at a near-by hospital or predicting weather trends in a given city. In Africa, using historical data to predict future rain precipitation is quite important because it permits farmers to plan ahead their planting activity for the coming season (3 months period).
Like many places around the world, African farmers are always dealing with unpredictable weather conditions which usually affect different areas of the continent and often result in crop loss or very poor yields. Partially solving this dilemma, by helping farmers estimate rainfall levels for the coming season, is quite important because the consequences of misestimating future rainfall may result in total financial ruin to some farmers or even worst, severe food shortage in a whole region.
Data scientist ... tight spot
Last year we were approached by an expert who was using machine learning models to predict rainfall levels for different African regions at a weather monitoring centre in Niamey, the capital of Niger.
The problem as explained was that available ML models in the field were limited because they do not allow for exploration under different scenario. In other words, they do not permit data scientist to tweak the input or change the target in order to study and estimate rainfall for a given city under different conditions.
30 years’ precipitation data from NASA
The time series data used in our focused study came from NASA’s historical records and for our purpose, the data scientist helped navigate their website and then download rainfall precipitation (in mm) for the first 90 days of the planting season and for a period of 30 years – all for Abidjan, the largest economic city in Cote d'ivoire.
Overall the time series data is straightforward. In addition, to the 90 columns it also included 4 additional columns – totals rainfall for each of the 3 months and the grand total for the season as a whole. With this data in hand, we started working on a very interesting problem – something we had never encountered and had never thought of using our Flek Machine to solve.
GoFlek ... to rescue
What made this problem even more interesting, is that despite the apparent simplicity of the data, the number of questions that a farmer can ask is huge and the kind of predictions that needs to be performed are countless.
For example, if in the first month of a given season the rain fall was low and the total rainfall for the previous season was within a given range what will be the range of rainfall in the last month of the planting season and that of the season as a whole. With such prediction at hand, farmers can delay their planting for the 3rd month. Therefore, saving their crops and improving overall yields.
By knowing the ideal time to plant and estimating expected accumulative rainfall, farmers also have the option to choose a totally different crop to plant. Thus, reducing overall risk and helping them buy the right seeds and fertilizers just before planting time.
Flek: forward and backward prediction
For GoFlek, and given our forward and backward capable predictive engine, running all these scenarios is straight forward, especially when compared to what the data scientist had to go through using off-the-shelf predictive models. In fact, he had to build many disconnected models each with a different input and output depending on how many months ahead he needs to estimate and which of the previous months or seasons he needs to use as input. Moreover, he had to prepare for each scenario in mind a different dataset for modelling, training, validation and tuning.
Overall, the data scientist was very pleased using the Flek Machine. Making his experience even more satisfying, was the fact that Flek’s probabilistic model is universal and "open". Once built, the user can, at any time and under various scenarios, run his queries without remodelling or retraining as well as trace back his predictions in order to peek into how they were computed.
The entire project took about 2 weeks to complete, and then 1 week to test a number of scenarios with the data science expert.