Improving the Basics of GIS Students ’ Specialism by Means of Application of ESDA Method

University education should highlight the cultivation of students’ capability to make them qualified for meeting the social market requirements. University students, who major in GIS, in addition to abilities and qualities equipped generally, are supposed to have abilities in GIS basic theory and cutting-edge technologies, GIS software operation and data collection and processing, spatial data modeling and analysis, independent learning as well as GIS scientific research and innovation. The paper takes the spatial analysis of Fujian telecommunications consumption data as the example, attempts a new perspective that introduces ESDA, an analytic method, into the research of telecommunications consumption, thus combining perceptual and conceptual knowledge, qualitative and quantitative analysis, so as to not only improve the competency of GIS majors, but also cultivate their spatial analysis capability, develop favorable thinking mode, therefore laying a solid basis for their future study and work.


INTRODUCTION
According to the needs of subject penetration and practical teaching, teachers of various disciplines in modern colleges and universities should constantly enrich their own theoretical knowledge and teaching methods of data analysis.Through the effective development of permeation teaching, the theory and method of data analysis are implemented in the teaching and practical activities of various subjects.Through the training concept of data analysis ability which runs through the whole process of teaching, students can gradually improve their own ability of data analysis and cultivate the systematic thinking mode of data analysis, and promote the improvement of students' comprehensive quality and professional competitiveness in the four years of studying and living in the university.This paper takes Fujian communication consumption data analysis as an example, uses spatial autocorrelation analysis method from county scale, as well as explores relationship between spatial differentiation law of communication consumption and economic development.This not only makes GIS students' master basic knowledge of spatial autocorrelation, but also cultivate their spatial analysis ability, form a good mode of thinking, and lay a solid foundation for future study and work.
In some cases, spatial autocorrelation is a special and very effective technique that can effectively answer questions, such as spatial distribution of phenomena.Around 1950, Moran's spatial analysis based on biological phenomena extended correlation coefficient of one-dimensional spatial concept to two-dimensional space, thus defining Moran exponent (Moran, 1950).Shortly thereafter, Geary proposed concept of Geary coefficient, analogous to Durbin-Watson statistics of regression analysis (Geary, 1954).After decades, through efforts of vast number of geologists, especially related work of Cliff and Ord, spatial autocorrelation has gradually developed into one of important topics in geospatial analysis.Another prominent theme is Wilson's spatial interaction theory and model.On basis of Moran exponent and Geary coefficient, Anselin developed a local analysis method of spatial autocorrelation.Getis et al. proposed a spatial relation index based on distance statistics (Getis & Ord, 1992).In particular, creation of Mo ran scatter plot analysis method represents a significant progress in spatial autocorrelation analysis.In China, research papers and works on spatial autocorrelation are in ascendant, covering theories, methods and techniques with more practice and application (Tobler, 2014).The rest of the paper is organized as follows: In section 2, we introduce research methods, global autocorrelation, and the local autocorrelation.The Data sources and preprocessing are shown in section3.Section 4 describes the analysis of the case.Finally, conclusions are presented in section 5.

Exploratory Spatial Data Analysis (ESDA)
Following Anselin (1998), ESDA (exploratory spatial data analysis) is a collection of techniques to describe and visualize spatial distributions (Anselin, 1999;Anselin, Sridharan, & Gholston, 2007); identify atypical locations or spatial outliers; discover patterns of spatial association, clusters or hot-spots; and suggest spatial regimes or other forms of spatial heterogeneity.Central to this conceptualization is the notion of spatial autocorrelation or spatial association, i.e., the phenomenon where locational similarity (observations in spatial proximity) is matched by value similarity (attribute correlation).True ESDA pays attention to both spatial and attribute association.ESDA is a subset of EDA (exploratory data analysis) methods that focus on the distinguishing characteristics of geographical data, specifically, on spatial autocorrelation and spatial heterogeneity.Exploratory Data Analysis graphical and visual methods are used to identify data properties for purposes of pattern detection in data.ESDA techniques can help detect spatial patterns in data, lead to the formulation of hypotheses based on the geography of the data, and in assessing spatial models.ESDA requires that numerical and graphical procedures be linked with a map.
ESDA is focuses on the description and interpretation of the spatial relationships of regionalized variables, particularly spatial autocorrelation and spatial heterogeneity.ESDA is also used to describe and visualize spatial distributions, identify atypical locations or spatial outliers, discover patterns of spatial association and suggest all types of spatial heterogeneity.ESDA analysis include the construction of spatial weight matrix, scatterplot, global spatial autocorrelation measure, and local spatial correlation identification.

Global Autocorrelation
SA (spatial autocorrelation) studies the potential interdependence of variables among observational data in the same distribution area (Yang & Wong, 2013;Zhang et al., 2011), which aims to study whether the communication consumption expenditure of the county area covered by 9 cities in Fujian province will be affected by neighboring regions.Firstly, the study calculates the global and local indexes of the spatial autocorrelation of the communication consumption expenditure, and the global index is used to detect the relation of the spatial distribution state of the whole consumption, that is, to use Moran's I to reflect the autocorrelation degree of the region, and also uses the local Moran's I to adopt local indicators from the unit part, thereby determining the degree of relevance of consumption of each space unit and adjacent units.If the approximate region is close to the value, then the spatial autocorrelation is strong, conversely, the spatial autocorrelation is weak.The formula is: where n represents the number of county domain objects, i and j represent two counties, x means communication consumption expenditure, i.e. phone bill.  is the space weight matrix, which represents the proximity between the space position i and j.When i and j are adjacent to the space position, using the Euclidean distance between the county points, and when the distance is greater than the set value,   = 1.On the country, when   =0, under the condition that significant level is given, if the Moran' s I is significantly positive, the higher or lower the communication consumption of the districts in space, and the more closer to 1 the value is.And the overall space difference is smaller.Conversely, if Moran's I is significantly negative, it shows that the distribution of communication between the county and its surrounding counties has a significant spatial difference, if the value of which is nearer to 1, the overall spatial difference is larger.Only when Moran's I is close to expected -1/(n-1), the observed values are independent of each other and randomly distributed in space.

Contribution of this paper to the literature
• The main contribution of this paper to the literature is related to how the level of economic development influences training/support the respondents' willingness to expend the money during the telecom communication.
• The special issue shows the need for diverse research approaches.The broad range of research foci for statistical methods of fostering the mathematics learning, under a classroom learning perspective, and under a professional development perspective, goes hand in hand with a broad range of approaches.

Local Autocorrelation
Global spatial autocorrelation analysis is a spatial data analyzing method measuring inter-region spatial difference and correlation as a whole, which reflects the average clustering degree of intra-region similarity.However, when the sample data is enormous, global spatial correlation might black out the randomness of subset data.To uncover some local spatial clustering high-low values or anomalous patterns, local spatial autocorrelation method is introduced.Global Moran's I statistic (Cervero & Kang, 2011;Su, Xiao, Jiang, & Zhang, 2012) is a general statistical index, indicating the average degree of spatial difference between counties.In the case that the overall spatial difference between districts and counties are shrinking, the spatial difference of the local area may be enlarged.In order to reflect the change trend of the spatial difference of the communication consumption, the ESDA local area analysis method is also needed.In order to recognize the autocorrelation of local space, the value of local spatial autocorrelation statistic of each spatial position is computed, and the formula of local Moran's I with space position as i is as such: The standardized statistic of local Moran index test is: (  ) and (  ) are its theoretical expectation and theoretical variance, and   is the spatial weight.Among them, if   >0 and   > 0, the district i is located in the H-H quadrant, which indicates that the communication consumption value of the districts and counties themselves is higher, and the spatial difference degree is small; if   >0 and   < 0, the district i is located in the L-L quadrant, indicating that the communication consumption value of the county itself and surrounding counties is lower, and the spatial difference degree of the two is more insignificant.  <0 and   > 0, the district i is located in H-L quadrant, which indicates that the communication consumption value of the district is higher, the surrounding counties is lower, and the spatial difference degree of the two is larger;   <0 and   <0, the district i is located in the L-H quadrant, indicating that the county and district's own communication consumption value is lower, that of the surrounding counties are higher, and the spatial difference between the two is higher.

GIS Spatial Analysis
GIS (Geographic Information System), that is, the geographic information system.Spatial analysis is one of the core functions of geographic information system (Sánchez-García, Canga, Tolosana, & Majada, 2015; Wang & Chen, 2015).It is unique to the extraction, representation and transmission of geographic information (especially implicit information), which is the main function characteristic of geographic information system different from general information system.In this paper, GIS spatial analysis technology (Gimpel et al., 2015) is used to analyze the GDP and communication consumption clustering of 84 urban residents in Fujian Province, and the regional distribution map is formed to explore the regional differences.Global Moran's I statistic was used for spatial autocorrelation analysis in ArcGIS 10.0.Moran's I, p value and Z score were calculated to test the spatially clustered tendency between Communication Consumption of urban resident and GDP cases.Confidence level of 99% was selected.Values of P < 0.01 were considered statistically significant.

Study Area and Data Source
Fujian Province is located on the southeastern coast of China, facing Taiwan across the Taiwan Straits.The province is mostly mountainous, and is traditionally described to be "Eight Parts Mountain, one part water, and one part farmland".The northwest is higher in altitude, with the Wuyi Mountains forming the border between Fujian and Jiangxi.It is the most forested provincial level administrative region in China, with a 65.95% forest coverage rate in 2013.The highest point of Fujian is Mount Huanggang in the Wuyi Mountains, with an altitude of 2157 m.Fujian has a subtropical climate, with mild winters.In January the coastal regions average around 7-10 °C (45-50 °F) while the hills average 6-8 °C (43-46 °F).In the summer, temperatures are high, and the province is threatened by typhoons coming in from the Pacific.Average annual precipitation is 1,400-2,000 millimeters (55-79 in).Although Fujian is one of the wealthier provinces of China, its GDP (Gross Domestic Product) per capita is only about the average of China's coastal administrative divisions.In 2011, Fujian's nominal GDP was 1.74 trillion Yuan, a rise of 13 percent from the previous year.Its GDP per capita was 46,802 Yuan.By 2015 Fujian expects to have at least 50 enterprises that have over 10 billion RMB in annual revenues.The government also expects 55 percent of GDP growth to come from the industrial sector.
In order to better reveal the difference of economic space between counties in Fujian province, in the exploratory spatial data analysis, the spatial analysis scale is defined as the county domain, including 9 prefecture-level cities, a total of 84 districts.By acquiring China's 1:1 million county area administrative division electronic map as the basic graphic data, with the application of GIS technology, the visual simulation and research of relevant statistical data are carried out.From the Fujian operator system, the masking data of the payment of operator agent office (dealer) from January to March, 2017 is extracted.After the data pre-processing (such as the deletion of the monthly consumption behavior of the irrational, incomplete field data, etc.), finally the qualified data of a total of 4,307,570 items are obtained and the GDP data mainly from the 2016 "Fujian Statistical Yearbook -2016".The data distribution is shown in Figure 1 and Figure 2.
From Figure 1 and Figure 2, it can be seen that the difference in the volume of communication consumer spending in different months is not significant.At the beginning and the end of the month, the trading volume appears a relatively high peak.The operator system conducts centralized checkout at the beginning and the end of the month, resulting in bulks of customer with no phone balance at the end of the month, thereby they need to top up.From the point of view of the transaction number of the consumer spending, the trading trend is basically the same, and the peak of the transaction occurs between 9 and 12.There is another small peak at 15 o'clock in the afternoon as well.

Empirical Analysis
It is well known that the 3σ regular or the z-fraction method based on normal distribution assumes that the data obeys the normal distribution, but the actual data is often not strictly obey the normal distribution.Their criteria for judging outliers are based on the mean and standard deviation of the data batches, and the robustness of the mean and standard deviation is very small, and the exception value itself will have a large impact on them, resulting in a number of outliers not exceeding 0.7%.It is obvious that the effectiveness of this method is limited in judging abnormal values in non-normal data.The drawing of the box-plot diagram relies on the actual data and does not require prior assumption that the data is subject to a specific distribution form.Without any restrictive requirements for data, it is just a true and intuitive representation of the original shape of the data; the standard of judging anomaly value of Box line chart is based on four-bit and four-bit distance, four-digit number has certain robustness, up to 25% Data can become arbitrarily far without disturbing the four-bit number, so the anomaly value cannot be affected by this standard, the result of the identification of the abnormal value of the box line diagram is more objective.Therefore, in order to improve the research accuracy, we use the box-plot diagram to find out the outliers, Upper four quartile Q1 smaller than 1.5 * four quartile range: Q3-Q1 or lower four quartile Q3 larger than 1.5 * IQR is used as the outlier; from January to March in 2017, the box line of communication consumption of Fujian counties is shown in Figure 3.For an example, we remove the 3 outliers from the Quanzhou and Zhangzhou in 201701 ( Figure3 (a)).After removing the outliers, we can explore the relationship between communication consumption and GDP, because different variable data dimensions differ, they are uniformly normalized to 0 and 1, and then the contrast curves are made.As shown in Figure 4.
It can be seen from Figure 4, those counties that GDP is higher, the residents of the average communication expenditure is more, which indicates that the communication consumption expenditures in the county area of the province are positively correlated.In order to further clarify the regional differences between 84 counties in Fujian Province, the GDP and the communication consumption index of each county in the year of 2017 were clustered separately.The Moran's I values of communication consumption expenditure during the January to March in 2017 are all above 0.48 under the significance level of p<0.001 by computing, which denotes a significant positive spatial correlation exists.This means that counties with high values are spatial adjacent to areas with high values, and counties with low values are also spatial adjacent to that with low values.
If we want to examine the correlation between multiple variables at the same time, it is very troublesome to draw a simple scatter plot between them.The scatterplot matrix can be used to plot the scatter plots of the variables to find out the main correlation between variables quickly.Scatterplot matrix can assign the binary relationship between variables, the junction of two variables is their scatterplots, and the plots are the same and below the main diagonal.We can only demonstrate the upper triangular or lower triangular figures by adjusting parameters.There are kernel density figures in the main diagonal, while the others have linear or smoothly fitted curves.
We explore the relationship between the cost of each month's communications expenses and other factors, including GDP, number of transactions, and regional length (which can be used as a regional area).As can be seen from the Figure 5, the monthly communication expenditure presents a single peak curve, and each prediction variable is skewed in some degree.Communication consumption increases with the increase of GDP, and the change trend of trading times is basically the same, but with the increase of area, it has a certain decrease.This shows that the region with high economic level has correspondingly higher communication consumption, but with the increase of regional area, this is because the area of the western counties in Fujian province is larger, the economy is relatively backward, and the amount of communication is less.When the distance matrix is applied, the appropriate distance thresholds shall be chosen to obtain the appropriate number of adjacent regions, thereby better describing the distribution of the data in this case.If the distance threshold is too large or too small, it is possible to obtain data space autocorrelation that is rather insignificant.
It can be seen from Table 1 and Figure 6, among three months in 2017, it is a very small number of counties and cities in the HH, LH and LL quadrant, most of them show a not significant.The" high-high" are also referred to as "hot spots", whereas the "low-low" are referred to as "cold spots".Significant differences were found in the local evolution of the distribution pattern of characteristic factors of average communication expenditure in Fujian over the three months.The communication consumption expenditure of each county inhabitant has the spatial correlation, which manifests as follows: Gulou District, Jimei District, Licheng District and the East Coast's communication expends concentrate in a high degree.While the Northwest region has Low gathering degree; simultaneously, there also appears spatial heterogeneity and nonstationary, that is, the urban inhabitant correspondence consumption expenditure has obvious difference.The results from LISA (Local Indicators of Spatial Association) in Figure 4 shows that the county area with p value equal to 0.05, such as Jinan District and Changle area, has a significant high value agglomeration, that is, local and peripheral communication consumption expenditure are both higher.There is no region with P value less than 0.0001, there is no significant low concentration area, which indicates that there is a certain correlation between each county and GDP."Hot spots" and "cold spots" are both very prominent.7 areas including Lianjiang County, Changle City, Jinan District, Minhou County, Fuqing City, Xianyou County and the Xiuyu District of Putian City are clustering in HH-type, which are the "hot spot" area where the expenditure of communication consumption is concentrated; there are 16 "cold spot" areas, mainly located on the northwestern edge, such as Ninghua County, Jianning County, Taining County, Shaowu City, Guangze County and so on, with the factors clustering in LL-type.In addition, there are 3 distinct "singular value" county domains, which are in contrast to the surrounding counties, considered as significant HL or LH-type heterogeneous regions, such as Yongan City, Luoyuan County and Pingtan County.
From Figure 7 we can see that the spatial distribution of urban residents' communication expenditure behavior in different counties of Fujian province is not scattered, but has some inherent regularity, that is, they have positive autocorrelation relationship.However, this kind of regularity manifests as a kind of cluster phenomenon in the space of communication consumption expenditure of certain county area, that is, high ones tend to be adjacent, and low ones tend to be adjacent, which indicate that the county area with higher consumption ability is relatively close to the county area with higher consumption ability, or the county area with lower consumption ability is relatively closer to the county with lower consumption ability.In the comparison between the GDP distribution in the county and the monthly expenditure of the communication consumption, the amount of GDP increase will, in some degree, result in the increase of the residents' communication consumption expenditure, and there is a certain correlation between GDP and the communication consumption.But the communication consumption expenditure and the economic development degree are not completely positively related, which is due to that the consuming ability and the expense tendency of different groups are different.Thus, this generated a significant difference in their communication consumer expense.For example, the Yanping District of Nanping City and Yongan City have high GDP figure, however, the communication consumption expenditure and its surrounding counties and districts are not much.Meanwhile, in the western region of Fujian, it also shows that the consumption expenditure of the residents still has a considerable space to improve, and the consumer demand for communications is not completely released.Contributed by the rising income level, the communication expenditure per person will increase.And the communication market still has great potential.This paper uses three-month long period as our time scale, if a longer time period was utilized, the research results would be more precise and convincing.During the study process, county-level areas of Fujian province are our study units, whether or not finer township level study units lead to a different research result is conductive for future discussions.

CONCLUSIONS
In the information age, there is a higher demand for the ability of data analysis of talents, which is different from the requirement of cultivating the ability of data analysis in teaching.Students of modern specialties need to have a certain ability of data analysis and collation, so as to meet the actual work of the collection, collation, analysis of all kinds of data needs.Therefore, modern colleges and universities must make clear the importance of training students' ability of data analysis, in order to train the thinking of the ability of data analysis throughout the teaching process, and implement the theory and practice of data analysis into the teaching process of various subjects.On the basis of subject content, we should carry out practical teaching activities, improve students' ability of data analysis and promote students' comprehensive quality and employment competitiveness.
Geography not only studies spatial distribution and characteristics of geographical things, but also clarifies spatial differences and spatial relations.It is devoted to revealing spatial movement and law of spatial change of geographical things.GIS can help learners develop spatial ability, solve geospatial problems, and improve learners' spatial reasoning ability and spatial thinking level.In this paper, we use exploratory data analysis, spatial weight matrix, scatterplot, global spatial autocorrelation analysis, and local spatial autocorrelation analysis to study Fujian communication consumption expenditure, and explore and analyze some of characteristics and anomalies.Since all kinds of data taken from real world of life exist on a certain spatial location, it is difficult to satisfy idealized condition of mutual independence between data in adjacent spatial position of a variable.Therefore, application and development of spatial autocorrelation index analysis in field of national economy has a very broad application prospect.For students majoring in GIS, mastering this analysis method is of great significance to help students learn geography knowledge, to understand relationship between people and environment, and to solve geographical problems.However, use of ESDA method to cultivate learners' spatial ability also needs curriculum support, so it is necessary to develop high-quality and challenging GIS classroom learning model.

Figure 1 .Figure 2 .
Figure 1.Average daily data distribution of communication consumer expenditure (a) Communication expenditure in 2017-01 (b) Communication expenditure in 2017-02 (c) Communication expenditure in 2017-03 Figure 3. Communication Consumption of the Resident in Fujian County

Figure 4 .
Figure 4.The Relationship between Communication Expenditure and GDP

Figure 7 .
The autocorrelation clustering of communication expenditure (a) Distribution of County GDP in 2016 (b) Distribution of communication consumption in January (c) Communication consumption distribution in February (d) Communication consumption distribution in March Figure 8.The relationship between consumer spending and GDP