E-Assessment and Computer-Aided Prediction Methodology for Student Admission Test Score

Machine Learning is a scientific discipline that addresses learning in context is not learning by heart but recognizing complex patterns and makes intelligent decisions based on data. Currently, students have to face the problem of selecting the best suitable university for admission in engineering. There is no predictor system that recommends the students to select the specific category which is best to its academic career. Students have to first appear in the entry test and can’t predict whether he/she can pass the entry test to get admitted in University. To tackle this problem the field of Machine Learning develops algorithms that discover knowledge from specific data and experience, based on sound statistical and computational principles. After going through the entry test students have to face problems for selecting the preferences among different categories due to the lack of knowledge of intake merits of preceding years. Another problem arises when students are waiting for admission in specific university, meanwhile, other universities finish their admission processes and select the students, but some students can’t take admission in any university due to no prediction system for admission in universities. In this work, we would like to develop an E-Assessment and Computer-Aided Prediction online system that enables the student to predict the entry test numbers by giving the Metric and Intermediate marks and other academic numbers. The suggested scheme has been demonstrated to perform at the maximum speed under MATLAB setup.


INTRODUCTION
In today's technology driven era it would be difficult to come across any field not yet affected by wonderful advancements in computer technology.Specifically, the domain of machine learning techniques enables the computer to learn without openly programmed on the basis of previous data sets.By using the Machine Leaning techniques and Statistical tools it would be easy to enhance the predictive power.These techniques require the ability and skill on data analysis, data scientists on raw data by refining it for prediction.For data analysis, Machine Learning is a very powerful tool to get the required results.By data sources, computing power can be gain during the process.Predictions can be making easily by going straight to the data.It is one of the most significant ways for prediction.Traditional predicting from the old data is unavailable and retrieval approaches from old data have proven to be hopeless, time-consuming and burdensome.The incompetence of traditional prediction searches can be dealt with Machine Learning based predictor System Easily.Machine learning tools give the opportunity for a decision support system (DSS).Decision support system are computer application programs that go to the analyses of business data and present the data in an efficient way that will be easy for the user to make its own business decision very easy (Chen, Chiang, & Storey, 2012).Decision support system is basically informational applications that work in normal business operations.

State of the literature
• The authors have been conducted by researchers to predict recommendations on the basis of 'Web of Trust' with accurate suggestions (Ben-Shimon et al., 2007;Ziegler & Lausen, 2004).• The researchers Group Recommender System is also known as e-group activity recommender system, and suggest preferences in various domains for example music, movies, web links, events and vacation plans (Brusilovski, Kobsa, & Nejdl, 2007;Garcia & Sebastia, 2014).• The work of discusses a G2B system 'Smart Trade Exhibition Finder' (STEF) which predicts on the basis of semantic similarity mechanism.Recommendations based on fuzzy relations are used to reflect the graded or uncertain information in the G2B techniques (Ghazanfar & Prügel-Bennett, 2014;Guo & Lu, 2007).• Zaiane developed an e-learning agent which predicts the information by working on association rule mining.This recommender system suggests the courses, learning the material and interesting subjects to the students of e-learning systems (Klašnja-Milićević, Ivanović, & Nanopoulos, 2015).

Contribution of this paper to the literature
• The research focuses on the accurate usage of semantically relevant data of old students for the mark prediction.• The first contribution of this research is a new realistic feature regression technique that supports understanding of machine learning.• The second contribution is finding the best predictors variables that fit the best regression line.
• The third contribution is the presentation of results of prediction online system that shows the result on the basis of old data of students.
Machine Learning is a scientific discipline that enables the user to make a program for a system that automatically learns and to improve with experience.Learning in this context is not learning by heart but recognizing complex patterns and make intelligent decisions based on data.Currently, students have to face the problem of selecting the best suitable University for admission in Engineering.There is no suitable predictor system that recommends the students to select the specific category which is best to its academic career.Students have to first appear in the entry test and can't predict whether a student can pass the entry test to get admitted in University (Koljatic & Silva, 2013).To tackle this problem the field of Machine Learning develops algorithms that discover knowledge from specific data and experience, based on sound statistical and computational principles.The presented system 'BizSeeker' provides a list of suggested partners by calculating semantic similarities.This system is further extended in and named as Smart BizSeeker based on a hybrid fuzzy semantic recommendation (HFSR) (Lu, Shambour, Xu, Lin, & Zhang, 2013).
After going through the entry test students have to face problems for selecting the preferences among different categories due to no knowledge of intake merits of preceding years.Another problem arises when students are waiting for admission in specific university, meanwhile, other universities finish their admission processes and select the students, but some students can't take admission in any university due to no prediction system for admission in Universities (Lu et al., 2013).In this work, we would like to develop an online decision support system that enables the student to predict the entry test numbers by giving the age, gender, Metric Maximum marks, Metric passing year, Metric marks obtained and as well as Intermediate data.The desired output can be achieved by effectively using the techniques of machine learning that are statistical tool regression.The regression line is used for predict the required results.The best-fitted line is calculated by using the high computational mathematical tool MATLAB (Thielicke & Stamhuis, 2014).MATLAB functions used on previous data of entry tests and applied the linear regression.Further is also enabling the student to select its choice of preference.The system supports the student whether a student can be eligible for that selected category.The system will manipulate from the old merit lists the best preference for the student (Hoxby & Avery, 2013).The focus of the presented dissertation is in more consistent ways to retrieve semantically correct result in an efficient way.The proposed technique resolves retrieval issues through machine learning techniques by improving the required results.

RESEARCH METHODOLOGY
The model used for the data collection and analysis in shown in Figure 1.This model can be applied to any type of data problem on the basis of length and specific outcome of the dataset.

Linear Regression
This is most commonly and reliable estimation technique.This takes the actual values from the users and based on the constant variables it predicts the best-fitted line (Montgomery, Peck, & Vining, 2015).The example of linear regression is to know the estimated cost of the houses in the specific location, number of sales will be done in the specific region of the basis of old data, how much calls will be received in the specific time, what will be the rate of stock exchange with the real time up and downs.In the linear regression, the combination of only two variables will be used.The relationship between the dependent and independent variables has been established by using the best-fitted line.The best-fitted line is called regression line.Representation of lie is done by a linear equation.The linear equation is: In the above equation: Y= Dependent variable a= Slope of the line X= Independent Variable b= intercept of the line Where Y is the variable whose value we are going to be found or estimated for prediction.a is the slope of the line and will be calculated by using the prescribed formula.X value is the independent value and is given by the user.The intercept of the line is also calculated by the formula.
The linear regression can be understood by the example of arranging the people in class on the basis of increasing order of their weight.But the weights of the people are unknown.The people will be arranged on relying the heights of the people which is to be known by analyzing the visually.The combination of the people is to be made on using the visible parameters.The heights are to be figured and correlation of height and weight is to be made by using the regression line given above for relationship (Faraway, 2016).The coefficients 'a' and 'b' are calculated by minimization the sum of squared difference of distance by using the data points and regression line.Suppose the best line of the data has been calculated for the linear equation is as: a = 0.2811 and b = 13.9 By putting the values into the regression straight line equation: y = a * X + b y = 0.2811 * X + 13.9 We have now calculated the linear regression line.By putting the height at the place of X, the weights of the people can be easily predicted.

Multiple Linear Regressions
Multiple regression is an extension of simple linear regression technique.Simple linear regression deals with the bi-variant samples, which deals only with two variables the one is dependent variable Y and the another one is independent variable X.Multiple regression deals with three or more variables, in which the one is dependent variable and other are multiples independent variables (Covrig & McConaughy, 2015).For this we will include new terms for every entrance of each independent variable, the equation will come as: The β's are coefficients for the independent variables in the true or population equation and the x's are the values of the independent variables for the member of the population (Nath Das & Mukhopadhyay, 2016).The intercept α or β0 is the value where regression plane cross on the Y axis.The value of the β1 predicts Y per unit X1.The slope for variable X2 (β2) predicted the alteration in Y per unit X2 having X1 constant.

Significance Test
Testing the level of significance is done by F and t-test.When the simple linear regression line is fitted then the t and F test gives the answer.While in the multiple regression F and t tests provides a different conclusion.
F-test: In this test, we try to examine whether the significant relationship has been existing between the set of dependent variables and all independent variables.F test is also called the test of all the variables.
T-Test: F test checks the overall level of significance among the variables while the t-test determines the significance among all the independent variables separately.For each independent variable, the separate t-test will be determined.The t-test is also known as a test for individual significance.

Testing of F test significance
The hypothesis was made for checking the significance.

Testing of t test for significance
Hypothesis can be checked for t test is by H0: βi = 0 Ha: βi = 0

RESULTS AND DISCUSSION
The execution evaluation of any structure can give information about the aftereffects of trademark estimations.The method fundamentally examines the framework to upgrade computation's layout and demonstrates the results procure by realizing and differentiating the estimation.The execution setup of experimentation consolidates Matlab 2015 on windows 7 working structure presented on the corei7 machine.The proposed arrangement is contemplated against standard counts similarly as precision, the exactness and correctness of results, and the audit estimations of the system.

Student Datasets
The dataset composed of 5042 students has been collected from the admission section of University of Engineering and Technology, (UET) Taxila.The dataset is collected from the admission department of the UET, Taxila.This dataset is gathered more than 20 years records of the students, those got admission in UET Taxila in different Disciplines. Gender

EXPERIMENTAL SETUP
The selected data set is scanned and a regression line has been drawn to fit the dataset values.On getting multiple regression lines a straight line has been drawn.The linear regression line is then used for perdition purpose.The portal has been made that takes the student academic record from the user.This record has been used for predicting the marks a student can get in entry test.In machine learning techniques, textual data is first converted into numerical data, so when there is textual data e.g.Male and Female, it is converted to a numeric value.In this case, we have assigned the value of 1 to male students and 2 to the female students.We have 4452 records of the male student and 590 records of female students.Gender disparity in this study is very natural because the collected data is of engineering disciplines.Male students have more tendency towards engineering as compared to female students in Pakistan.Female students have more interest in Arts / Humanities, Social Sciences, and other sciences related disciplines in Pakistan.Training data is not affected with gender differences because the features set for training data is not biased towards it.The dataset also contains the information of passing students in matriculation exams of different years from 1993 to 2015.This information contains 30 different boards of examination which conduct matriculation exams.For the matriculation boards, different numbers are assigned to the boards for changing to quantitative data.The number of students is also mentioned below for information.The value has been assigned to the different boards as The student has to enter the maximum marks of metric.The numbers depend on the specific board that may be 850 or 1050.The board wise distribution of the students is shown as in Figure 2.
The most important on which the system depends is the metric and higher secondary school marks.The greatest influence is dependent on the regression is these two variables.Multiple regression lines are used for the prediction of entry test marks.As illustrated above, the line can be represented as: As the manual calculation for the intercept β0 and coefficients are difficult, all the data has to be given to the MATLAB for getting the straight-line coefficients.On putting the data to the MATLAB, the outcome of coefficients comes as shown in Table 1.The above coefficients are then put into the Multiple Linear Regression Straight Line.The line is used to take the inputs from the user and by manipulation, on the inputs, the value of the dependent variable will be calculated.For evaluating the result of the prediction system record of different boards has been given to the system.Comparison of actual and predicted result has been compared and a graph of the result of prediction has been represented.
Ensemble learning method in classification, regression can be used for random forests or random decision forests that work by making multitude decision trees in training time and producing the class that is the way of classes or mean prediction of the individual trees.The correction of decision trees produced by random decision forests that over fitted to their training set.Random forest of 100 trees, and each constructed by taking 4 random technical features.Error results from out of bag error is 29.84 that is computed from dataset as shown in Table 3.The dataset of 20 records for federal boards has been taken and their graph of actual and predicted marks by the Online Entry Test Marks Prediction System (OETMPS).Their dataset in the shown as in Table 4.
In the Figure 3 above graph, it can be seen that actual marks and predicted marks are close to each other.The difference of the predicted and actual entry test marks are represented and the median of the differences is 12.965.Another part of dataset has taken for the 20 students of Rawalpindi Board and their data is represented in tabular as shown in Table 5.   ---------> Actual Marks Predicted Marks   ----------> No. of Students ---------------------> Actual Marks

No. of Students ------------------>
Predicted Marks In the Figure 4 above graph, it can be seen that actual marks and predicted marks are close to each other.The difference of the predicted and actual entry test marks are represented and the median of the differences is 11.81.Another dataset of 20 students of Faisalabad Board has been selected from the entire dataset of students.Their dataset in the shown as under in Table 6.
The Figure 5 shows the comparisons of the marks predicted by the system with actual entry test marks obtained by the student of BISE Faisalabad.The difference of the predicted and actual entry test marks are represented and the median of the differences is 9.735.7. The Figure 6 shows the comparisons of the marks predicted by the system with actual entry test marks obtained by the student of BISE Karachi.--------> No. of Student ------------------------ The 20 number of students has been selected from the Sargodha board and their respective marks are as under show in Table 8.The Figure 3 shows the comparisons of the marks predicted by the system with actual entry test marks obtained by the student of BISE Sargodha.The difference of the predicted and actual entry test marks are represented and the median of the differences is 10.475.
The system is developed and currently working online at the website of university of engineering and technology Taxila.Screen shots of the running system is shown as in Figures 8 and 9.The computer aided prediction system is named as Online Entry Test Marks Prediction System (OETMPS) used for calculating the student entry test marks.
It also aggregates percentage that will be possible according to the university admission rule and regulations as shown in Figure 9.This aggregate percentage result is further used for the identification of the discipline in which possible student take admission.

Factors Beyond System
Some of the factors that not considered for the implementation of the system.These factors directly or indirectly affect the performance of the students during the process of entry test.Some students take admission in academies for the preparation of entry test.They can take higher marks in the entry test as compared to the other students who don't take admission in any academy.The result their percentage of getting better marks before the entry test as compared to other students.One factor may be that a student reached at its test place after the start of entry test then it's impossible for students to solve the entire test in a limited time.
The case may be that student has a better percentage in its background academic career but due to this, it's impossible to take the better entry test marks.

CONCLUSION AND FUTURE WORK
The focus of this research was to present a new entry test predictor system for appropriate understanding by using the statistical tools.The research activities not only show the entry test marks but also helps the student to choose the best of its carrier on the basis of previous results.This segment highlights the most vital result of this research.The explanation of this study is to beat the confinements and downside of existing methods.The presented technique offers a new procedure for test prediction results by using the statistical tools.Linear Regression Line presented in the proposed work provides the line that best fitted to the provided data.The main reason for the success of this research work is to facilitate the fresh graduates going to enter in the universities.The proposed features are classified through machine learning technique to improve the performance.In future, the proposed solution is more generic and collect the dataset of the most technical universities.This technique will be extended and can be implemented for the field of Medical Science, Basic Sciences.Other factors will be included that a student is mentally not prepared due to any reason for the test and cannot fully take attention to the test.The result marks of entry test will be affected.Some students reached their destination after the start of the test and they cannot fully give attention to their paper.Some students do not have material for their filling of test and as a result, their test has been canceled.There exists the valuable scope in the field of Machine Learning to an emphasis on the development of methods that would be able to improve the performance of the results of the entry test prediction marks.

Figure 1 .
Figure 1.Model for the assessment of the Entry Test and Prediction Model

Figure 2 .
Figure 2. Board wise distribution for admitted Students

Figure 3 .
Figure 3. Analysis of Predicted and Actual Marks for the Islamabad Board

Figure 4 .
Figure 4. Analysis of Predicted and Actual Marks for the Rawalpindi Board

Figure 2 .
Figure 2. Analysis of Predicted and Actual Marks for Faisalabad Board

Figure 6 .
Figure 6.Comparison Analysis of Predicted and Actual Marks for Karachi Board

Figure 3 .
Figure 3. Analysis of Predicted and Actual Marks for Sargodha Board

Figure 8 .Figure 9 .
Figure 8. Student data entry form used for the assessment of entry test marks, aggregate and admission

NOTES 1 .
Inter Board Committee of Chairmen (IBCC) 2. Punjab Board of Technical Education (PBTE) disparity in this study is very natural because the collected dataset is of UET and Engineering disciplines.Male students have more tendency towards engineering as compared to female students in Pakistan.Female students have more interest in Management Sciences, Arts / Humanities, Social Sciences, and other sciences related disciplines in Pakistan.Training data is not affected by gender differences because the features set for training data is not biased towards it.Moreover, research has also negated the use of convenience sampling.This data is used for estimation of the results of students in entry test.11 attributes have been taken that has the direct impact on the result of the entry test marks.The data is first organized as per requirements.Machine Learning techniques are used for predicting the best results.Multiple Regression tools are used for prediction purpose.This tool is implemented by using the MATLAB 2015.

Table . Table 1 .
Values allocated to boards

Table 1 .
Coefficient with Values

Table 3 .
Attribute Values generated by the Random Forest and Random Tree

Table 4 .
Students marks of Islamabad Board

Table 5 .
Students marks of Rawalpindi Board

Table 6 .
Dataset of Faisalabad Board Students

Table 7 .
Dataset of Karachi Board Students

Table 8 .
Dataset of Sargodha Board Students