CN104537062A - Address information extracting method and system - Google Patents

Address information extracting method and system Download PDF

Info

Publication number
CN104537062A
CN104537062A CN201410836668.XA CN201410836668A CN104537062A CN 104537062 A CN104537062 A CN 104537062A CN 201410836668 A CN201410836668 A CN 201410836668A CN 104537062 A CN104537062 A CN 104537062A
Authority
CN
China
Prior art keywords
address
address information
module
rule
canonical formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410836668.XA
Other languages
Chinese (zh)
Inventor
姬东鸿
汪闯闯
白旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Original Assignee
DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd filed Critical DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Priority to CN201410836668.XA priority Critical patent/CN104537062A/en
Publication of CN104537062A publication Critical patent/CN104537062A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The invention relates to an address information extracting method and system. The method includes the steps that firstly, multiple extracting rules are generated, and an extracting rule set is formed; secondly, a piece of text information is read, address information is extracted from the text information according to the extracting rules in the extracting rule set; thirdly, the structure of the address information is analyzed, and large-to-small administrative divisions included in the address information are segmented; fourthly, whether the superior administrative division exists in the largest administrative division in the address information or not is judged, if yes, the fifth step is executed, and if not, the sixth step is executed; fifthly, the superior administrative division is automatically complemented according to the information stored in a database, the address information is obtained, and the sixth step is executed; sixthly, whether a next text needs to be extracted continuously or not is judged, if yes, the second step is executed, and if not, the seventh step is executed; seventhly, extracting is completed. An address extracted from the text can be clearly and visually displayed, and concept information relevant to the address is analyzed.

Description

A kind of address information abstracting method and system
Technical field
The present invention relates to a kind of address information abstracting method and address.
Background technology
Along with the arriving of Internet information age, how from internet quick obtaining information accurately, it is the important symbol weighing social progress.Information means is active, and also just lays a good foundation for correct decision-making.This all has very important meaning for national security, government operation and business activity.
There is a lot of application software all to need detailed address information now, and due to huge as the quantity such as news, microblogging of text, make Search Address information impersonal force to be, and high recall rate can not be ensured.Major function of the present invention extracts the address information in text, if original text comes from microblogging, news, just can provide real-time disaster and the location information of some accidents for government department, be convenient to make rapidly corresponding counter-measure; For commercial department, wait communication common carrier as mobile, the address of sexuality in group is just extremely important to them, by address information in this class text, mobile base station just can be sent to solve some bursty communication congestion problems.
Meanwhile, the present invention also provides address hierarchy to resolve and automatically supplements the function of its higher level's zoning, and these information can be supplied to supplier and some navigation softwares of electronic chart, can help to realize positioning function fast and accurately.
At present, the address information also not having a independently software to be used for extracting in text on the market is also resolved.Although search engine has function of search, do not have purpose, its Keywords matching cannot describe the search need of user; Higher level's zoning that rudimentary zoning is corresponding do not considered by search engine, cannot accurate description place, and location just difficult for more; The result of search engine or the text of Un-structured, some need the application of address cannot use at all.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind ofly has higher recall rate and accuracy, can provide structurized address information, can greatly reduce the address information abstracting method of manual operation to multiple application.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of address information abstracting method, specifically comprises the following steps:
Step 1: generate multiple decimation rule based on address database, forms decimation rule collection;
Step 2: read a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
Step 3: resolve the structure of address information, is partitioned into the administrative division from big to small comprised in address information;
Step 4: judge whether administrative division maximum in address information also exists higher level's administrative division, if so, performs step 5; Otherwise, perform step 6;
Step 5: according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, performs step 6;
Step 6: judge whether to continue to extract next text, if so, perform step 2; Otherwise, perform step 7;
Step 7: complete address information and extract.
The invention has the beneficial effects as follows: can be clear and intuitive demonstrate the address of extracting out in text, and parse its relevant conceptual information.Again, the present invention can carry out the descending segmentation of administrative division to extracted address, this can clearly find out the composition form of address by let us, and this is very important input for electronic chart simultaneously, and quick position can be helped to the place of indication.Finally, the present invention is directed to exactly and only have the address of city-level can automatically for it fills its higher level's administrative division, namely provincial, and for district's level, then can fill provincial and city-level for it, a city-level or district's level can be accurate to the scope in a Huo Yige city of province by this mode, and almost can determine the uniqueness of address in this way, this has sizable effect obviously to the quick position on electronic chart.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 specifically comprises the following steps:
Step 1.1: first extract all data relating to province, city, district or town from address database;
Step 1.2: successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Step 1.3: above-mentioned sub-rule is combined and obtains multiple rank different canonical formula of height and address key words;
Step 1.4: each canonical formula and the address key words related to thereof form a decimation rule, multiple decimation rule forms decimation rule collection.
The beneficial effect of above-mentioned further scheme is adopted to be.
Further, described canonical formula rank according to administrative division size distribution stage other height; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
Further, described step 2 specifically comprises the following steps:
Step 2.1: read a text message, the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Step 2.2 a: matching field is mated with the canonical formula in decimation rule from low to high according to rank, is met the address information of canonical formula;
Step 2.3: judge whether also there is the matching field do not mated, if existed, performs step 2.2; Otherwise, perform step 3.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of address information extraction system, comprises rule extraction module, address abstraction module, parsing module, judge module, completion module and continues judge module;
Described rule extraction module generates multiple decimation rule based on address database, forms decimation rule collection;
Described address abstraction module, for reading a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
The structure of described parsing module to address information is resolved, and is partitioned into the administrative division from big to small comprised in address information;
Described judge module, for judging whether administrative division maximum in address information also exists higher level's administrative division, if so, triggers completion module; Otherwise, trigger and continue judge module;
Described completion module is used for according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, triggers and continues judge module;
Described continuation judge module is used for judging whether to continue to extract next text, if so, and trigger address abstraction module; Otherwise, complete address information and extract.
The invention has the beneficial effects as follows: can be clear and intuitive demonstrate the address of extracting out in text, and parse its relevant conceptual information.Again, the present invention can carry out the descending segmentation of administrative division to extracted address, this can clearly find out the composition form of address by let us, and this is very important input for electronic chart simultaneously, and quick position can be helped to the place of indication.Finally, the present invention is directed to exactly and only have the address of city-level can automatically for it fills its higher level's administrative division, namely provincial, and for district's level, then can fill provincial and city-level for it, a city-level or district's level can be accurate to the scope in a Huo Yige city of province by this mode, and almost can determine the uniqueness of address in this way, this has sizable effect obviously to the quick position on electronic chart.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described rule extraction module comprises data extraction module, sub-rule module, composite module and collection modules;
Described data extraction module is used for from address database, extract all data relating to province, city, district or town;
Described sub-rule module be used for successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Described composite module is used for combining above-mentioned sub-rule obtaining multiple rank different canonical formula of height and address key words;
Described collection modules is used for each canonical formula and the address key words that relates to thereof to form a decimation rule, and multiple decimation rule forms decimation rule collection.
Further, described canonical formula rank according to administrative division size distribution stage other height; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
Further, described address abstraction module comprises read module and matching module;
Described read module is for reading a text message, and the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Described matching module is used for all matching fields to mate with the canonical formula in decimation rule from low to high according to rank, is met the address information of canonical formula.
Language and platform are extracted in address: major function describes address and feature thereof, and the relation between address.
The extraction of address related notion: the administrative division mainly comprising each atom level comprised in address positional information in the text (comprising starting point and terminal), type belonging to address and address.
For city-level, district's level, can automatically for it supplements its higher level's administrative division.
First, the address that the present invention extracts grows coupling most, as long as namely sequence of addresses just processes as an address according to administrative division is descending, the address that this let us obtains is accurate to less region as far as possible.Secondly, address realm involved in the present invention is very wide, almost relates to all kinds that we are familiar with.Specifically, comprise province, city, district, town (township), village (ridge), group (team), street (street, main road, road), large (business) tall building, community (community), state (province) road, unit, building, floor, number, this makes the present invention can find out address in text with higher recall rate.
Accompanying drawing explanation
Fig. 1 is a kind of address information abstracting method process flow diagram of the present invention;
Fig. 2 is a kind of address information extraction system structured flowchart of the present invention;
Fig. 3 is the address relationship figure that 2. 1. canonical formula extract with canonical formula;
Fig. 4 is the process flow diagram of the specific embodiment of the invention.
In accompanying drawing, the list of parts representated by each label is as follows:
1, rule extraction module, 2, address abstraction module, 3, parsing module, 4, judge module, 5, completion module, 6, continue judge module, 11, data extraction module, 12, sub-rule module, 13, composite module, 14, collection modules, 21, read module, 22, matching module.
Embodiment
Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, a kind of address information abstracting method of the present invention, specifically comprises the following steps:
Step 1: first extract all data relating to province, city, district or town from address database;
Step 2: successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Step 3: above-mentioned sub-rule is combined and obtains multiple rank different canonical formula of height and address key words;
Step 4: each canonical formula and the address key words related to thereof form a decimation rule, multiple decimation rule forms decimation rule collection.
Step 5: read a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
Step 6: resolve the structure of address information, is partitioned into the administrative division from big to small comprised in address information;
Step 7: judge whether administrative division maximum in address information also exists higher level's administrative division, if so, performs step 8; Otherwise, perform step 9;
Step 8: according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, performs step 9;
Step 9: judge whether to continue to extract next text, if so, perform step 5; Otherwise, perform step 10;
Step 10: complete address information and extract.
The rank of described canonical formula is according to other height of size distribution stage of administrative division; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
Described step 2 specifically comprises the following steps:
Step 2.1: read a text message, the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Step 2.2 a: matching field is mated with the canonical formula in decimation rule from low to high according to rank, is met the address information of canonical formula;
Step 2.3: judge whether also there is the matching field do not mated, if existed, performs step 2.2; Otherwise, perform step 3.
As shown in Figure 2, be a kind of address information extraction system of the present invention, comprise rule extraction module 1, address abstraction module 2, parsing module 3, judge module 4, completion module 5 and continue judge module 6;
Described rule extraction module 1 generates multiple decimation rule based on address database, forms decimation rule collection;
Described address abstraction module 2, for reading a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
The structure of described parsing module 3 pairs of address informations is resolved, and is partitioned into the administrative division from big to small comprised in address information;
Described judge module 4, for judging whether administrative division maximum in address information also exists higher level's administrative division, if so, triggers completion module 5; Otherwise, trigger and continue judge module 6;
Described completion module 5, for according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, triggers and continues judge module;
Described continuation judge module 6 continues to extract next text for judging whether, if so, and trigger address abstraction module; Otherwise, complete address information and extract.
Described rule extraction module 1 comprises data extraction module 11, sub-rule module 12, composite module 13 and collection modules 14;
Described data extraction module 11 for extracting all data relating to province, city, district or town from address database;
Described sub-rule module 12 for successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Described composite module 13 obtains multiple rank different canonical formula of height and address key words for combining above-mentioned sub-rule;
Described collection modules 14 is for forming a decimation rule by each canonical formula and the address key words that relates to thereof, and multiple decimation rule forms decimation rule collection.
The rank of described canonical formula is according to other height of size distribution stage of administrative division; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
Described address abstraction module 2 comprises read module 21 and matching module 22;
Described read module 21 is for reading a text message, and the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Described matching module 22, for being mated with the canonical formula in decimation rule from low to high according to rank by all matching fields, is met the address information of canonical formula.
Below a specific embodiment of the present invention:
1. from database, first take out the data in province, city and region town, be then combined into the rule extracting separately province, city, district, town successively, next just carry out principle combinations to extract various forms of address in text, combine as follows:
Economize: (economizing canonical formula) (province)?
City: ((economizing canonical formula) (province)?)? (city's canonical formula) (city)?
District: ((economizing canonical formula) (province)?)? ((city's canonical formula) (city)?)? (district's canonical formula)
Wherein economize in canonical formula and do not comprise " province ", do not comprise in " city " in city's canonical formula, district's canonical formula is complete name, comprises " district " word of class.For province's canonical formula, city's canonical formula, district's canonical formula, we progressively can add some and limit in language material training, combine the mistake caused to reduce part in ambiguity or article as far as possible without logic word.
Then our Zai Jiang district and other group of addresses synthesize the address extracting below various districts level, and its form is as follows:
Address: (((economizing canonical formula) (province)?)? ((city's canonical formula) (city)?)?
(district's canonical formula))? (address canonical formula);
(a) (b) in above-mentioned expression formula?, represent that a character string has after an a and follow zero or a b.
Because district's level following address wide variety is numerous and diverse, so the form of address canonical formula is also quite complicated, specifically by town (township), village (ridge), group (team), street (street, main road, road), large (business) tall building, community (community), state (province) road, unit, building, floor, a number class various atom zonings form according to various possible address format, we select and severally typically to explain.
Address canonical formula: 1. | 2. | 3. | 4. | 5. | 6. | 7.;
Wherein:
1. village: (town canonical formula)? ([Chinese character] { 1,7} village) ([numeral] { 1,3} (group | team))?
2. town: (town canonical formula) (([Chinese character] { 1,7} village)? [numeral] { 1,3} (group | team))?
3. road: ([Chinese character] { 2,6} road) ([Chinese character] { 2, x} (village | lane | section));
4. road: [Chinese character] 2,6} (main road | (? <! State | economize) road | street (road)?);
5. community: [Chinese character] 2,5} (community | community);
6. national highway, provincial highway: [numeral] 1,3} (national highway | provincial highway);
7. lane: [Chinese character] 2,5} (lane) ([Chinese character] { 2,5} (street | village))?
(a) (b) in above-mentioned expression formula?, represent that a character string has after an a and follow zero or a b.
For above-mentioned [Chinese character], according to the service condition of often kind of different address style Chinese character, use different character sets respectively, the simple expression of above-mentioned canonical formula just to tediously long canonical formula.For above-mentioned canonical formula, some addresses can be matched by two canonical formulas, and as 2. 1. " Lin Pu village, Lin Shan town " can be mated simultaneously, the address that we finally extract also can not be repeated.1. between the address 2. extracted, relation as shown in Figure 3.As can be seen from figure we, the identical probability in address that their extract is eight or nine ten percent, only has a little part to be that they can extract separately.But this does really necessary, both them with certain redundance for cost, the address of independently releasing supplements mutually, thus improve extract recall rate.This thought with redundance being cost exchanges degree of accuracy for is very common in our address decimation rule.
After all generate rule, we just start to extract address from text.Extract address to extract according to administrative division is ascending, do like this and can avoid extracting address again in address.As " Hongshan District, Wuhan City, Hubei Province ", the original center of gravity in this address should drop on " Hongshan District ", if we extract from big to small from administrative division, will become three addresses, i.e. " Hubei Province ", " Wuhan City ", " Hongshan District ", this has obviously departed from our final place center of gravity; If when extracting from small to large according to administrative division, we start just can extract " Hongshan District, Wuhan City, Hubei Province " most, according to the position at its place, we just can avoid continuing to extract " Hubei Province " and " Wuhan City " in this inside, address, and the center of gravity of address is also " Hongshan District ".So we extract address and will extract from small to large according to administrative division, finally extract all addresses meeting canonical formula.
2. the segmentation of address administrative division
After extracting all addresses, we also will resolve the structure of address, are partitioned into the administrative division from big to small comprised in address.We, according to the key word (as town, village) of address afterbody, first determine the address style belonging to it, may form then according to various types of address, it are carried out from front to back to the parsing of administrative division, until be parsed.
Such as, for address " Hongshan District, Wuhan City, Hubei Province ", the result that we finally resolve its is " Hongshan District, Wuhan City, Hubei Province (District) 266275 Hubei Province, Province| Wuhan City; City| Hongshan District; District ", and wherein " (District) " is its address style, and " 266275 " are the position at place, address, the result that " Hubei Province; Province| Wuhan City, City| Hongshan District, District " resolves for administrative division.
3. higher level's administrative division is automatically supplementary
For the address of city-level and district's level, we can for its automatic powder adding adds higher level's zoning in the process of resolving administrative division.The result that the administrative division of " Hongshan District " is resolved is " Hubei Province, Province| Wuhan, City| Hongshan District, District ", and " Wuhan " analysis result is " Hubei Province, Province| Wuhan, City ".Address information that this analysis result is perfect, makes us be easy to navigate to a concrete unique place.
As shown in Figure 4, the flow process described in the specific embodiment of the invention is as follows:
1. reading database, create-rule.As described in core procedure 1.
2. read in text, according to the rule extraction address that 1 obtains.
3. the division of administrative division is carried out in pair address extracted.As described in core procedure 2.
4. handled address divides in pairs 3, if city-level, district's level separately, supplements its higher level's administrative division according to database knowledge.As described in core procedure 3.
5. if also have text, then forward 2 continuation process to; Otherwise forward 6 to.
6. terminate to quit a program.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. an address information abstracting method, is characterized in that, specifically comprises the following steps:
Step 1: generate multiple decimation rule based on address database, forms decimation rule collection;
Step 2: read a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
Step 3: resolve the structure of address information, is partitioned into the administrative division from big to small comprised in address information;
Step 4: judge whether administrative division maximum in address information also exists higher level's administrative division, if so, performs step 5; Otherwise, perform step 6;
Step 5: according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, performs step 6;
Step 6: judge whether to continue to extract next text, if so, perform step 2; Otherwise, perform step 7;
Step 7: complete address information and extract.
2. a kind of address information abstracting method according to claim 1, it is characterized in that, described step 1 specifically comprises the following steps:
Step 1.1: first extract all data relating to province, city, district or town from address database;
Step 1.2: successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Step 1.3: above-mentioned sub-rule is combined and obtains multiple rank different canonical formula of height and address key words;
Step 1.4: each canonical formula and the address key words related to thereof form a decimation rule, multiple decimation rule forms decimation rule collection.
3. a kind of address information abstracting method according to claim 2, is characterized in that, the rank of described canonical formula is according to other height of size distribution stage of administrative division; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
4. a kind of address information abstracting method according to Claims 2 or 3, it is characterized in that, described step 2 specifically comprises the following steps:
Step 2.1: read a text message, the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Step 2.2 a: matching field is mated with the canonical formula in decimation rule from low to high according to rank, is met the address information of canonical formula;
Step 2.3: judge whether also there is the matching field do not mated, if existed, performs step 2.2; Otherwise, perform step 3.
5. an address information extraction system, is characterized in that, comprises rule extraction module, address abstraction module, parsing module, judge module, completion module and continues judge module;
Described rule extraction module generates multiple decimation rule based on address database, forms decimation rule collection;
Described address abstraction module, for reading a text message, obtains address information according to multiple decimation rules that decimation rule is concentrated from Text Information Extraction;
The structure of described parsing module to address information is resolved, and is partitioned into the administrative division from big to small comprised in address information;
Described judge module, for judging whether administrative division maximum in address information also exists higher level's administrative division, if so, triggers completion module; Otherwise, trigger and continue judge module;
Described completion module is used for according to the information stored in database, to address information auto-complete higher level administrative division, obtains uniquely concrete address information, triggers and continues judge module;
Described continuation judge module is used for judging whether to continue to extract next text, if so, and trigger address abstraction module; Otherwise, complete address information and extract.
6. a kind of address information extraction system according to claim 5, is characterized in that, described rule extraction module comprises data extraction module, sub-rule module, composite module and collection modules;
Described data extraction module is used for from address database, extract all data relating to province, city, district or town;
Described sub-rule module be used for successively by above-mentioned relate to province, city, district or town data assemblies become to extract separately the sub-rule of address information in province, city, district or town;
Described composite module is used for combining above-mentioned sub-rule obtaining multiple rank different canonical formula of height and address key words;
Described collection modules is used for each canonical formula and the address key words that relates to thereof to form a decimation rule, and multiple decimation rule forms decimation rule collection.
7. a kind of address information extraction system according to claim 6, is characterized in that, the rank of described canonical formula is according to other height of size distribution stage of administrative division; A canonical formula in every one-level canonical formula higher than its rank is comprised in a described low level canonical formula.
8. a kind of address information extraction system according to claim 6 or 7, is characterized in that, described address abstraction module comprises read module and matching module;
Described read module is for reading a text message, and the address key words adopting decimation rule to concentrate is mated, and obtains all matching fields;
Described matching module is used for all matching fields to mate with the canonical formula in decimation rule from low to high according to rank, is met the address information of canonical formula.
CN201410836668.XA 2014-12-29 2014-12-29 Address information extracting method and system Pending CN104537062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410836668.XA CN104537062A (en) 2014-12-29 2014-12-29 Address information extracting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410836668.XA CN104537062A (en) 2014-12-29 2014-12-29 Address information extracting method and system

Publications (1)

Publication Number Publication Date
CN104537062A true CN104537062A (en) 2015-04-22

Family

ID=52852590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410836668.XA Pending CN104537062A (en) 2014-12-29 2014-12-29 Address information extracting method and system

Country Status (1)

Country Link
CN (1) CN104537062A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468791A (en) * 2016-01-05 2016-04-06 北京信息科技大学 Geographic location entity integrity expression method based on interactive question and answer community-Baidu knows
CN106202028A (en) * 2015-04-30 2016-12-07 阿里巴巴集团控股有限公司 A kind of address information recognition methods and device
CN106649802A (en) * 2016-12-29 2017-05-10 广东精规划信息科技股份有限公司 Address cloud service platform
CN106682175A (en) * 2016-12-29 2017-05-17 华南师范大学 Method and system for matching address
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN106777377A (en) * 2017-02-09 2017-05-31 辛国臣 Logistics odd numbers generation method and device
CN106934631A (en) * 2015-12-29 2017-07-07 阿里巴巴集团控股有限公司 Name data processing method and processing device
CN106959961A (en) * 2016-01-11 2017-07-18 阿里巴巴集团控股有限公司 A kind of Address Recognition method and device
CN107025232A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The processing method and processing device of address information in logistics system
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer
CN108038090A (en) * 2017-12-26 2018-05-15 北京明朝万达科技股份有限公司 A kind for the treatment of method and apparatus of Text Address
CN109615290A (en) * 2018-11-28 2019-04-12 北京京东尚科信息技术有限公司 For obtaining the method, apparatus, system and medium of address for service
CN109871435A (en) * 2019-03-01 2019-06-11 陈包容 The method of social account is extracted from text
CN109872098A (en) * 2018-12-12 2019-06-11 平安科技(深圳)有限公司 Logistics address resolution method and computer equipment based on the dispatching of vehicle insurance declaration form
CN111382554A (en) * 2018-12-11 2020-07-07 顺丰科技有限公司 Floor information extraction method and system
CN111753515A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Address information extraction and matching method for realizing entity positioning
CN111882013A (en) * 2020-07-31 2020-11-03 平安国际融资租赁有限公司 Equipment asset monitoring method and device, computer equipment and storage medium
CN111914557A (en) * 2020-07-31 2020-11-10 上海燕汐软件信息科技有限公司 Address resolution method, device, equipment and computer readable storage medium
CN112347221A (en) * 2021-01-08 2021-02-09 北京安泰伟奥信息技术有限公司 House address similarity analysis method and device
CN112835922A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address division classification method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020156779A1 (en) * 2001-09-28 2002-10-24 Elliott Margaret E. Internet search engine
CN101452555A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Method for enquiring personal credit information, system and personal credit enquiring system
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN103309992A (en) * 2013-06-20 2013-09-18 武汉大学 Position information extraction method facing natural language
JP2013228888A (en) * 2012-04-25 2013-11-07 Nippon Telegr & Teleph Corp <Ntt> Region estimation device, method and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020156779A1 (en) * 2001-09-28 2002-10-24 Elliott Margaret E. Internet search engine
CN101452555A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Method for enquiring personal credit information, system and personal credit enquiring system
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
JP2013228888A (en) * 2012-04-25 2013-11-07 Nippon Telegr & Teleph Corp <Ntt> Region estimation device, method and program
CN103309992A (en) * 2013-06-20 2013-09-18 武汉大学 Position information extraction method facing natural language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜萍: "基于本体的中国行政区划地名识别与抽取研究", 《中国博士学位论文全文数据库哲学与人文科学辑》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202028B (en) * 2015-04-30 2019-10-11 阿里巴巴集团控股有限公司 A kind of address information recognition methods and device
CN106202028A (en) * 2015-04-30 2016-12-07 阿里巴巴集团控股有限公司 A kind of address information recognition methods and device
CN106934631A (en) * 2015-12-29 2017-07-07 阿里巴巴集团控股有限公司 Name data processing method and processing device
CN105468791A (en) * 2016-01-05 2016-04-06 北京信息科技大学 Geographic location entity integrity expression method based on interactive question and answer community-Baidu knows
CN105468791B (en) * 2016-01-05 2019-11-15 北京信息科技大学 A kind of integrality expression for the geographical location entity known based on interacting Question-Answer community-Baidu
CN106959961A (en) * 2016-01-11 2017-07-18 阿里巴巴集团控股有限公司 A kind of Address Recognition method and device
CN107025232A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The processing method and processing device of address information in logistics system
CN106649802A (en) * 2016-12-29 2017-05-10 广东精规划信息科技股份有限公司 Address cloud service platform
CN106682175A (en) * 2016-12-29 2017-05-17 华南师范大学 Method and system for matching address
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN106709065B (en) * 2017-01-19 2020-08-04 国家电网公司 Address information standardization processing method and device
CN106777377A (en) * 2017-02-09 2017-05-31 辛国臣 Logistics odd numbers generation method and device
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer
CN108038090A (en) * 2017-12-26 2018-05-15 北京明朝万达科技股份有限公司 A kind for the treatment of method and apparatus of Text Address
CN109615290A (en) * 2018-11-28 2019-04-12 北京京东尚科信息技术有限公司 For obtaining the method, apparatus, system and medium of address for service
CN111382554A (en) * 2018-12-11 2020-07-07 顺丰科技有限公司 Floor information extraction method and system
CN109872098A (en) * 2018-12-12 2019-06-11 平安科技(深圳)有限公司 Logistics address resolution method and computer equipment based on the dispatching of vehicle insurance declaration form
CN109871435A (en) * 2019-03-01 2019-06-11 陈包容 The method of social account is extracted from text
CN111753515A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Address information extraction and matching method for realizing entity positioning
CN111882013A (en) * 2020-07-31 2020-11-03 平安国际融资租赁有限公司 Equipment asset monitoring method and device, computer equipment and storage medium
CN111914557A (en) * 2020-07-31 2020-11-10 上海燕汐软件信息科技有限公司 Address resolution method, device, equipment and computer readable storage medium
CN112347221A (en) * 2021-01-08 2021-02-09 北京安泰伟奥信息技术有限公司 House address similarity analysis method and device
CN112835922A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address division classification method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN104537062A (en) Address information extracting method and system
CN101313300B (en) Local search
CN103186524B (en) A kind of place name identification method and apparatus
CN109033086A (en) A kind of address resolution, matched method and device
CN103440311A (en) Method and system for identifying geographical name entities
CN102880721B (en) The implementation method of vertical search engine
CN105528372A (en) An address search method and apparatus
CN104965847A (en) Information displaying method and apparatus
CN103902535A (en) Method, device and system for obtaining associational word
CN105068989A (en) Place name and address extraction method and apparatus
CN103885983A (en) Travelling route determining method, and optimizing method and device
CN101196915A (en) Electronic map device and its implementing method
CN103473289A (en) Device and method for completing communication addresses
CN107016084A (en) A kind of place name address quickly positions the method with inquiry
CN101605126A (en) A kind of method and system of multi-protocol data Classification and Identification
CN104679801A (en) Point of interest searching method and point of interest searching device
CN110399448B (en) Chinese place name address searching and matching method, terminal and computer readable storage medium
CN103250151A (en) Server, information-anagement method, information-management program, and computer-readable recording medium with said program recorded thereon
CN103473238A (en) Distribution address positioning system and method
CN108268445A (en) A kind of method and device for handling address information
CN103324749B (en) A kind of spatialization parsing based on received text address and method for correcting error
CN107644050A (en) A kind of querying method and device of the Hbase based on solr
CN103076894A (en) Method and equipment for building input entries for object identity information according to object identity information
CN106155998A (en) A kind of data processing method and device
CN112286927A (en) Method, device and storage medium for inquiring user data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150422