php中编码的检测和转换

网页编码有很多种,入UTF-8和GB2312,如果在编码为GB2312的页面上显示UTF-8的中文文字,肯定会是乱码,这常常出现在读取其它网页的标题或者rss。解决办法是读取完后在处理前先判断是什么编码,然后转换为需要的编码,php中有个函数叫mb_detect_encoding() ,可以判断字符串是什么编码。
 

看看php手册中的解释: 

mb_detect_encoding

(PHP 4 >= 4.0.6, PHP 5) 

mb_detect_encoding — Detect character encoding 

 

Description

string mb_detect_encoding ( string str [, mixed encoding_list [, bool strict]] ) 

mb_detect_encoding() detects character encoding in string str. It returns detected character encoding. 

encoding_list is list of character encoding. Encoding order may be specified by array or comma separated list string. 

If encoding_list is omitted, detect_order is used.

例子 1. mb_detect_encoding() example

<?php
/* Detect character encoding with current detect_order */
echo mb_detect_encoding($str);/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
echo mb_detect_encoding($str, "auto"

);
/* Specify encoding_list character encoding by comma separated list */
echo mb_detect_encoding($str, "JIS, eucjp-win, sjis-win"

 

);

/* Use array to specify encoding_list  */
$ary[] = "ASCII";
$ary[] = "JIS";
$ary[] = "EUC-JP";
echo
mb_detect_encoding($str, $ary);
?>

 

这样,如果$str的编码为UTF-8,要转换为GB2312,可以用 $str=iconv(‘utf-8′,’gb2312’,$str) ,这样,$str的编码就为GB2312了。 

<?php 

$str="你好";/这里假设$str的编码为UTF-8*/

if(mb_detect_encoding($str)=="UTF-8")
 

{ 

    $str=iconv('UTF-8','GB2312',$str); 

}
  
echo $str;
?> 

发表评论

邮箱地址不会被公开。

This site uses Akismet to reduce spam. Learn how your comment data is processed.