Monday, August 22, 2022

[Neustar] SDO_POINTINPOLYGON for fast, high-volume spatial processing

Many location intelligence and geospatial applications need to perform high volumes of spatial queries daily – with fast performance to service critical business applications. Neustar operates a cloud-based analytics platform called ElementOne, which provides real-time lead scoring, custom segmentation, and market analytics for its SaaS customers worldwide. The ElementOne platform runs 24/7, conflating data from 200-plus sources.

Read More: 1Z0-1093-21: Oracle Cloud Database Services 2021 Specialist

ElementOne combines spatial analysis with sophisticated analytics and predictive modeling. This gives clients real-time insights into the effectiveness of their marketing programs and store siting plans. Because the data is sensitive and needs to be available on demand, workflows must be highly secure, scalable, available, and reliable. Read more about Neustar’s business challenges and other Oracle Database 19c features they use here.

In this blog post, Nick Salem, Distinguished Engineer at Neustar, describes a specific Oracle Spatial function and Oracle Database 19c global temporary tables, and how they help Neustar deliver location analytics quickly, at scale. Nick shows how to use the SDO_POINTINPOLYGON function to find all customers within a certain distance of a sales trade area, and shares performance results using large customer point data sets.

--

The SDO_POINTINPOLYGON function API is a great feature that was first released in Oracle Database 12c. There is a nice blog post that explains how this feature can be used to address the challenges of ingesting large amounts of spatial data where you can handle the loading and querying of large spatial data sets without the overhead associated with creating and maintaining a spatial index. The post shows how SDO_POINTINTPOLYGON can really benefit massive scale operations, such as those using Exadata environments.

In this post, I would like to cover some other benefits that the SDO_POINTINPOLYGON feature provides that can be very helpful – especially for applications servicing a large number of concurrent spatial operations. This can greatly improve performance for such applications that run on either Exadata or non-Exadata environments. The fact that the  SDO_POINTINPOLYGON does not use a spatial index means that you can leverage data stored in an external table, global temporary table and now in Oracle Database 19c, these can also include hybrid partitioned tables, private temporary tables or JSON datasets to perform spatial point-in-polygon queries. Global temporary tables are great for multi-session environments because every user session has their own version of the data for the same global temporary table, without any contention or row locking conflicts between sessions. Furthermore, since Oracle Database 12c, there have been some major performance optimizations to global temporary tables that result in substantially lower redo and undo generation. You will need to make sure the system parameter temp_undo_enabled is set to TRUE to ensure that the global temporary tables optimization is fully in effect.

Below is a screenshot from Neustar’s ElementOne platform with a map showing a trade area and set of uploaded customer points.

Core Java, Java Programming, Oacle Java Career, Java Skills, Java Jobs, Java Tutorial and Materials, Java Cloud, Oacle Java Certification, Java Preparation Exam

At Neustar, our clients work with a lot of transient work data as part of a multi-step process for various spatial and analytical use cases. Let’s put together a quick PL/SQL script that you can use to test drive the power of the SDO_POINTINPOLYGON function. Here, I use a simple polygon in the San Diego area and generate a set of random customer points in and around the polygon. Then, I populate the global temporary table. The script is configurable: you can increase or decrease the number of randomly generated customer points, and how far from the polygon centroid you may want to allow points to extend to. Once you have the data populated, you can run the SDO_POINTINPOLYGON queries in serial or in parallel, or change some of the optional MASK parameters. Here’s a screenshot of the test polygon and a sample of randomly generated 1000 customer points.

Core Java, Java Programming, Oacle Java Career, Java Skills, Java Jobs, Java Tutorial and Materials, Java Cloud, Oacle Java Certification, Java Preparation Exam

1. Create a global temporary table


Ok, so let’s first create a global temporary table =>

create global temporary table TMP_SPATIAL_POINT (
x          number,
y          number,
id         varchar2(512) )
on commit preserve rows

2. Generate a set of random points and populate the global temporary table


Next, let’s run the following script to populate table TMP_SPATIAL_POINT. The script has two variables: maxDistanceInMeters and numberOfPoints in the PL/SQL declaration section that you can adjust as needed. If you want to generate more points, then you can change the value of numberOfPoints from 1000 to a greater number. In this example, I also have maxDistanceInMeters set to 4000. This will ensure that no customer points get generated further than 4000 meters away from the polygon centroid; this can be increased or decreased as needed. The script goes through a loop up to the numberOfPoints variable and uses the SDO_UTIL.POINT_AT_BEARING function to plot points around the centroid of the polygon using randomly generated values. The goal of the script is to quickly create some test data you can play with. Of course, you can also change the test polygon as well.

declare
polygon                     sdo_geometry;
centroid                    sdo_geometry;
newPoint                    sdo_geometry;
maxDistanceInMeters         number := 4000;
numberOfPoints              number := 1000;
type tRecs                  is table of tmp_spatial_point%rowtype;
recs                        tRecs := tRecs();
begin
polygon := SDO_GEOMETRY(2003, 4326, NULL, SDO_ELEM_INFO_ARRAY(1,1003,1),
SDO_ORDINATE_ARRAY(-117.1044,32.680882,-117.08895,32.661808,-117.06148,32.675102, -117.06045,32.697641, -117.09753,32.696774, -117.1044,32.680882));
centroid := SDO_GEOM.SDO_CENTROID( polygon, 0.5 );
recs.extend(numberOfPoints);
for i in 1 .. numberOfPoints loop
  newPoint := SDO_UTIL.POINT_AT_BEARING( start_point => centroid,
                                         bearing => dbms_random.value(0,6.283),
                                         distance => dbms_random.value(1,maxDistanceInMeters));
  recs(i).id := i;
  recs(i).x := newPoint.sdo_point.x;
  recs(i).y := newPoint.sdo_point.y;
end loop;
delete from tmp_spatial_point;
forall i in recs.first .. recs.last
  insert into tmp_spatial_point values ( recs(i).x, recs(i).y, recs(i).id ) ;
commit;
end; 

3. Run SDO_POINTINPOLYGON queries (in serial or parallel)


Ok, now we can start performing queries using the SDO_POINTINPOLYGON function. Here’s a sample query that returns the counts of points that fall inside the polygon. The params parameter is optional; if omitted, a MASK=ANYINTERACT query will be performed.

set timing on
select
  count(*)
from
  table(
  SDO_PointInPolygon(
                      cur =>  cursor(select * from tmp_spatial_point),
                      geom_obj => SDO_GEOMETRY(2003, 4326, NULL,
                                              SDO_ELEM_INFO_ARRAY(1,1003,1),
                                              SDO_ORDINATE_ARRAY( -117.1044,32.680882,
                                                                  -117.08895,32.661808,
                                                                  -117.06148,32.675102,
                                                                  -117.06045,32.697641,
                                                                  -117.09753,32.696774,
                                                                  -117.1044,32.680882)),
                      tol => 0.05,
                      params => 'MASK=INSIDE'
                       )
  ) t  ;

Here's another example of the query using parallelism and with the params parameter omitted.

select /*+ parallel(8) */
  count(*)
from
  table(
  SDO_PointInPolygon(
                      cur =>  cursor(select * from tmp_spatial_point),
                      geom_obj => SDO_GEOMETRY(2003, 4326, NULL,
                                              SDO_ELEM_INFO_ARRAY(1,1003,1),
                                              SDO_ORDINATE_ARRAY( -117.1044,32.680882,
                                                                  -117.08895,32.661808,
                                                                  -117.06148,32.675102,
                                                                  -117.06045,32.697641,
                                                                  -117.09753,32.696774,
                                                                  -117.1044,32.680882)),
                      tol => 0.05
                       )
  ) t  ;

The SDO_POINTINPOLYGON function has been built to leverage Oracle’s parallel processing capability. To demonstrate the magnitude of performance gain when utilizing parallelism, I modified the point generation script in part 2 to populate two million points with a max distance of 5,000 meters from the center point. I then tested the SDO_POINTINPOLYGON query with no parallelism, and then with parallelism of 2, 4 and 8. Here are the elapsed response times:

Level of parallelism Elapsed time 
None  8.56 secs
5.61 secs 
3.18 secs 
1.96 secs 

Utilizing parallelism can greatly shorten query processing times. You can use these scripts in your environment to generate different numbers of points, test various levels of parallelism, and compare the response times.

Source: oracle.com

Related Posts

0 comments:

Post a Comment