使用SQL实现对邮政编码分类规则

6月 19th, 2015

早晨在大阪工作的学长发信息过来要想用SQL实现对日本邮政编码分类的规则.本来学长只需要做概要设计即可,详细设计和开发都是外包给越南人做,但是担心越南的童鞋方法不能够满意,索性就直接把关键的步骤都自己实现掉,想想也是为了世界和平做了不少贡献;

规则是这样的:
1.日本邮编是7位的;
2.最终的结果集只需要两列(前缀 个数);
3.第一优先级是尾数为[00]的邮政编码;
4.第二优先级是前5位数字相同的邮政编码,只展现出现次数大于10次的结果,如果不足10次的邮政编码进入下一优先级统计;
5.第三优先级是前4位数字相同的邮政编码,规则同上;
6.第四优先级是前3位数字相同的邮政编码,规则同上;
7.第五优先级是剩余其它的邮政编码;

— 测试的数据,生成20W个邮政编码;
DROP TABLE TBPOSTTEST;
CREATE TABLE TBPOSTTEST AS
SELECT LPAD(ROUND(DBMS_RANDOM.VALUE (1, 9999999)) , 7, 0) postcode FROM DUAL
CONNECT BY LEVEL <= 200000;

— 实现的sql语句;
SELECT PREFIX, COUNT FROM (
WITH
P0P0 AS (SELECT ‘____00′ AS “PREFIX”, COUNT(*) c0 FROM TBPOSTTEST WHERE SUBSTR(postcode, LENGTH(postcode)-1) = ’00’),
P1P5 AS (SELECT SUBSTR(postcode, 0, 5) cp5, COUNT(*) c5 FROM TBPOSTTEST WHERE SUBSTR(postcode, LENGTH(postcode)-1) <> ’00’ GROUP BY SUBSTR(postcode, 0, 5)),
P2P4 AS (SELECT SUBSTR(cp5, 0, 4) cp4, SUM(c5) c4 FROM P1P5 WHERE c5 < 10 GROUP BY SUBSTR(cp5, 0, 4)),
P3P3 AS (SELECT SUBSTR(cp4, 0, 3) cp3, SUM(c4) c3 FROM P2P4 WHERE c4 < 10 GROUP BY SUBSTR(cp4, 0, 3))
SELECT ‘0’ AS “PRIORITY”, PREFIX, c0 AS “COUNT” FROM P0P0
UNION ALL
SELECT ‘1’, cp5 , c5 FROM P1P5 WHERE c5 >= 10
UNION ALL
SELECT ‘2’, cp4 , c4 FROM P2P4 WHERE c4 >= 10
UNION ALL
SELECT ‘3’, cp3 , c3 FROM P3P3 WHERE c3 >= 10
UNION ALL
SELECT ‘4’, ‘OTHER’ , SUM(c3) c2 FROM P3P3 WHERE c3 < 10)
ORDER BY PRIORITY;

— 测试的DEMO, 把条件改为大于2条;
INSERT INTO TBPOSTTEST VALUES(‘123400’);
INSERT INTO TBPOSTTEST VALUES(‘123500’);
INSERT INTO TBPOSTTEST VALUES(‘123450’);
INSERT INTO TBPOSTTEST VALUES(‘123451’);
INSERT INTO TBPOSTTEST VALUES(‘123452’);
INSERT INTO TBPOSTTEST VALUES(‘123453’);
INSERT INTO TBPOSTTEST VALUES(‘123454’);
INSERT INTO TBPOSTTEST VALUES(‘123444’);
INSERT INTO TBPOSTTEST VALUES(‘123555’);
INSERT INTO TBPOSTTEST VALUES(‘123566’);
INSERT INTO TBPOSTTEST VALUES(‘124444’);
COMMIT;

SELECT PREFIX, COUNT FROM (
WITH
P0P0 AS (SELECT ‘____00′ AS “PREFIX”, COUNT(*) c0 FROM TBPOSTTEST WHERE SUBSTR(postcode, LENGTH(postcode)-1) = ’00’),
P1P5 AS (SELECT SUBSTR(postcode, 0, 5) cp5, COUNT(*) c5 FROM TBPOSTTEST WHERE SUBSTR(postcode, LENGTH(postcode)-1) <> ’00’ GROUP BY SUBSTR(postcode, 0, 5)),
P2P4 AS (SELECT SUBSTR(cp5, 0, 4) cp4, SUM(c5) c4 FROM P1P5 WHERE c5 < 2 GROUP BY SUBSTR(cp5, 0, 4)),
P3P3 AS (SELECT SUBSTR(cp4, 0, 3) cp3, SUM(c4) c3 FROM P2P4 WHERE c4 < 2 GROUP BY SUBSTR(cp4, 0, 3))
SELECT ‘0’ AS “PRIORITY”, PREFIX, c0 AS “COUNT” FROM P0P0
UNION ALL
SELECT ‘1’, cp5 , c5 FROM P1P5 WHERE c5 >= 2
UNION ALL
SELECT ‘2’, cp4 , c4 FROM P2P4 WHERE c4 >= 2
UNION ALL
SELECT ‘3’, cp3 , c3 FROM P3P3 WHERE c3 >= 2
UNION ALL
SELECT ‘4’, ‘OTHER’ , SUM(c3) c2 FROM P3P3 WHERE c3 < 2)
ORDER BY PRIORITY;

— 测试结果;
PREFIX        COUNT
—————— ———-
____00               2
12345                 5
1235                   2
OTHER                2

标签:
目前还没有任何评论.